On-site analysis system with central processor and method of analyzing

ABSTRACT

A method of analysis, analysis system, program product, apparatus, and method of supplying analysis of value incorporating the use of at least one data acquisition device, a central processor, and a communication link that is connectable between the data acquisition device and the central processor. The central processor is loaded with multivariate calibration models developed for predicting values for various properties of interest, wherein the calibration models are capable of compensating for variations in an effectively comprehensive set of measurement conditions and secondary material characteristics. As so configured, the calibration models can compensate for instrument variance without instrument-specific calibration transfer. Measurement results generated by the central processor can be transmitted to an output device of a user interface.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of two U.S. ProvisionalPatent Applications, Serial Nos. 60/307,347 and 60/307,348, both filedon Jul. 23, 2001, the disclosures of which are incorporated by referenceherein. This application is also related to U.S. patent application Ser.No. ______, filed on even date herewith by James Thomas Kent, et al.,entitled “EXTENSIBLE MODULAR COMMUNICATION EXECUTIVE WITH ACTIVE MESSAGEQUEUE AND INTELLIGENT MESSAGE PRE-VALIDATION,” the disclosure of whichis incorporated by reference herein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

[0002] Not applicable.

FIELD OF THE INVENTION

[0003] The present invention is generally directed to a process foranalyzing materials, preferably at multiple locations. Moreparticularly, the present invention relates to a method of rapidanalysis which utilizes analytical instrumentation, for examplespectroscopic sensor units, on site to transmit and receive informationto and from a central processor. The invention also relates to a systemfor acquiring data from a sample at remote site locations concerning aproperty of interest of that sample, transmitting the data to thecentral processor for data analysis, and receiving and optionallydisplaying processed information at those locations. Further, theinvention relates to a method of providing analysis services tocustomers from a central processor.

BACKGROUND OF THE INVENTION

[0004] There are numerous instances where one or more properties of amaterial are preferably analyzed at one or more locations removed froman analytical laboratory where testing would normally be conducted. Forexample, agricultural products may be analyzed for the presence andconcentration of certain components during the crop growing stage, atharvesting, during transportation, or after the product has been stored,as at a grain silo. Other non-limiting instances where this type ofanalysis would be useful include applications in the processed foodindustry, the mining industry, the chemical industry, the finished hardgoods industries, and a variety of service, retail sales, and medicalindustries.

[0005] In the absence of equipment or skilled personnel for conductingsample analysis at the location of the sample, substantial time delayscan result in initiating and completing an analysis. Thus in the case ofthe analysis of an agricultural product such as an oil seed which wouldbe harvested within a narrow window of time, traditionally specificcharacteristics of the seed are determined by a laboratory. This is dueto the fact that the equipment and skilled personnel generally requiredto conduct such analyses are not normally available to the farmer oreven to the silo operator. Thus, if an oil seed is to be analyzed, asample is taken from the farmer's truck or from the silo operator andsent to an independent laboratory for analysis. It is not uncommon inthis situation for the sample to require one day for forwarding to thelaboratory, two days for the laboratory to conduct the analysis, and anadditional day for the results to be returned to the silo operator.Thus, a particular lot of oil seed may require four days to be analyzed.Where the value of the oil seed is dependent on the analysis, utilizinglaboratory analysis results in substantial delays to the farmer inobtaining a value for his crop, to the silo operator in valuing the cropand determining the market into which the seed will be sold, and to theultimate purchaser of the oil seed. When one considers that in the caseof oil seed harvesting the oil seed crop typically is processed within anarrow window of time over a wide geographic area, the individual delaysdescribed above become multiplied at each silo within the crop growingregion.

[0006] Alternatively, it is known to analyze certain components of aparticular agricultural product at the location where the material iseither grown, harvested, transported, or stored. Nevertheless, thisresultant analysis of the product may not be directly comparable to ananalysis of the same agricultural product in a different location, eventhough using the same methodology. Even when the same sample is analyzedat different locations, differences in analytical results may arise, forexample, because of a difference in environmental conditions at oneanalysis location relative to the other or because of a difference inthe performance of the analyzers. Results may also differ because ofvariations in the procedure of presenting the sample to differentanalyzers.

[0007] It may be convenient or necessary for on-site analyzers to beable to be easily transported from one location to another. A portablesensor unit or spectrometer is one that is sufficiently compact androbust to permit it to be transported to alternate testing locations asneeded. These units are able to be removed from service and returned toservice quickly for transportation to and use at a desired site foranalysis. The analytical instruments for such analysis must be ruggedand capable of making repetitive analyses without extensiverecalibration by a skilled operator and with little or no variation overthe course of use of the unit.

[0008] Because the analysis of, for example, a particular agriculturalproduct may need to be determined at locations over a wide geographicarea within a narrow time frame, it may become impractical to conductthe analyses using only one instrument. Generally it then becomesnecessary to test these products at multiple sites with multipleanalyzers. Under these circumstances, each of the analyzers must becalibrated so that the output results from the various analyzers can beproperly compared. Depending on the type of analysis, with traditionaltechniques the analytical results of a particular agricultural productusing multiple analyzers may vary because of different effects resultingfrom the environmental, instrument, or sample presentation variationsdiscussed above. To address this, either each analysis should beconducted under the same environmental and sample presentation testconditions, or each analysis should be completed with the ability tocompensate for differences in temperature, humidity and other relevantenvironmental variations in generating data by the individual analyzers.Further, results generated by different analyzers may differ because ofinherent manufacturing differences between the highly sensitiveinstrument components and differences in the precise assembly of thecomponents, the differences becoming more pronounced over time producinginstrument drift. As a result, no two analytical instruments areprecisely identical to one another, so accommodation must be made inconsidering the results generated by an analyzer when comparing theresults with those from other units. While work has been done to developpractical methods for transferring multivariate calibrations betweeninstruments, for example as discussed in U.S. Pat. No. 5,459,677, thesemethods require that some instrument, sometimes called a referenceinstrument or a master instrument, be maintained in some known orreproducible state or be capable of being brought into a reproducibleand well defined state to achieve instrument standardization. Then, amaster calibration model developed on one instrument can be transferredto a number of target instruments. However, all calibration transfer andinstrument standardization methods require additional steps to be takenat various times potentially over a range of time intervals after theinitial calibration transfer. For example, the analyst may have toevaluate a set of calibration transfer samples on the target instrumentsafter the initial calibration transfer, usually by a skilled operator,and adjust either the models or the instruments so that the responsefrom the target instruments agrees with the response from the master orreference instrument. Further, the measurement conditions of thematerial samples being analyzed, for example the sample temperature, mayalso be different at the various sites. Again, accommodations must bemade in considering the results generated from those material samples.

[0009] Another matter to be considered in conducting remote analyses ofmaterials such as agricultural products is the amount and quality ofinformation desired from the analysis, and the demands placed on theanalyzer. Generally, as the analyzer is able to perform moresophisticated analysis, the analyzer itself becomes more complex, ahigher level of training is required to operate the analyzer andinterpret the results, and the weight and size of the instrument mayincrease as a result. An analyzer capable of undertaking more complexanalyses is generally more susceptible to damage and to generatinginaccurate results by the process of moving the analyzer from site tosite, utilizing the analyzer under varying conditions, and the like.Consequently, the results from such an instrument are more likely tochange and thus render comparison between various remote analyzers moredifficult or even impossible.

[0010] The need to be able to generate comparable, statisticallyequivalent analyses of materials at remote site locations can extend toa wide range of materials in addition to agricultural products such as,but not limited to, manufactured products, natural phenomena, ores,renewable raw materials, fuels, and living tissue.

[0011] The combination of a calibration model with an analyticalinstrument to generate a predicted result has been practiced. It isknown to use, for example, calibration models associated withnear-infrared, mid-infrared, and Raman spectrometers in commercialprocesses to monitor the status of chemical reactions. This monitoringcapability can involve the generation of results from an analyticalmethod with the application of statistical analysis and calibrationmodels to interpret and quantify the data. For example, in themanufacture of carboxylic acids and derivatives from fats and oils, itis known to use near-infrared spectrometers loaded with the appropriatechemometric software to measure a number of properties of the carboxylicacids and their derivatives. This monitoring can be done during themanufacturing process on intermediate product, as well as on thefinished product. The spectrometer can be operated in a stand-alone modewith the operator bringing samples to the spectrometer for at-lineanalysis. Alternatively, the spectrometer can be connected in-line toenable monitoring of the process stream as the manufacturing operationproceeds. Thus, two commercially available near-infrared spectrometerssuch as the Bomem MB-160 FT-NIR spectrometer loaded with HOVAL software(such as Version 1.6, 1992) and AIRS software (such as Version 1.54,1999) from Bomem Inc., Canada, and the Bruker Vector 22/N spectrometerloaded with OPUS-NT Quant-2 software (such as Version 2.6,1999) fromBruker Optik GmbH, Germany have been used to analyze intermediate andfinished carboxylic acid products for acid value, iodine value, titer,viscosity, hydroxyl value, saponification value, composition of fattymaterials and derivatives, and for the presence of carboxylic acidmethyl ester contaminants in a specific carboxylic acid.

[0012] The calibration models for evaluating the above properties werederived from the Grams-PLS plus (Version 3.01B, 1994, GalacticIndustries Corporation) and Bruker OPUS Quant-2 software. In thoseinstances where more than one data acquisition device was used togenerate predicted results for a particular property of interest,individual calibration models were developed for correspondingindividual instruments or a master calibration model was developed on aparticular master instrument, transferred to one or more otherinstruments, and adjusted with instrument-specific correction factors tostandardize the predicted results across multiple instruments.

[0013] In determining the chemical properties of incoming raw materialssuch as tallow, coconut oil and palm kernel oil for the production ofcarboxylic acids, near-infrared spectrometry with appropriatechemometric techniques such as the partial least squares (PLS) methodhas been used to evaluate the free carboxylic acid content of thestarting materials, as well as iodine value and moisture content. Thenear-infrared monitoring can also be used to monitor the progress of thetransesterification process utilizing fatty triglycerides and methanolas reactants. A near-infrared spectrometer connected totransesterification process equipment can also monitor free glycerinecontent, bound/combined glycerine content and methyl esterconcentration. Alternatively, samples can be taken during the progressof the reaction to a stand-alone near-infrared spectrometer loaded withappropriate calibration models for off-line analysis. In connection withthe monitoring of the progress of a reaction, the near-infraredspectrometer can utilize a fiber optic probe connected to thespectrometer by fiber optic cable. The use of the near-infraredspectrometer in combination with the application of modeling softwarepermits analysis of particular chemical species during the progress ofchemical processing, as well as at the conclusion of the chemicalprocess. Spectrometers such as near-infrared operating in the in-linemode are capable of providing data substantially on a real time basis.Data generation in these instances occurs under tightly controlled testand environmental conditions and involves one or more probes connectedto a single instrument connected to a single data processing unit.

[0014] There is presently a high interest in the analysis ofagricultural products. Genetically modified materials are of particularinterest. The grain and food distribution segments in agriculture haveexpressed significant need for analytical technology to meet marketrequirements to identify and quantitate genetically modified crops,especially corn and soybean, in world markets. This need has developedrapidly. U.S. farmers have increasingly accepted crops derived fromgenetic engineering after the success they experienced in the 1996growing season. The U.S. Department of Agriculture estimated thatapproximately 25% of U.S. corn and 54% of U.S. soybeans produced in 2000were grown from genetically engineered seed with input traits to provideresistance to herbicides, insects, or both. The composition of suchinput trait crops is generally macroscopically indistinguishable fromsimilar crops without the corresponding input traits.

[0015] In contrast, the foods of the future which will incorporateimprovements of direct benefit to the consumer likely will be based atleast in part on crops having enhanced output traits. The composition ofthese enhanced crops is different from the corresponding conventionalcrops. Examples include high oil corn, high sucrose soybeans, and lowlinolenic canola. Genetically-enhanced crops can be produced either bygenetic engineering, as enabled by recent advances in biotechnology, orby specially designed traditional breeding programs. Even traditionalcrop improvement practices can result in plants with changed geneticsand enhanced properties.

[0016] The growth and the need for analytical technology foragricultural products has been the promulgation of labeling regulationsadopted in many regions of the world including the two largestagricultural commodity trading communities, the European Union andJapan. These labeling requirements have required or are expected torequire food processors to label finished food products as to thegenetically modified content of the ingredients used to produce theseproducts. The initiation of labeling and the growing number of foodprocessors electing to use raw materials which have not been geneticallymodified are driving the need for identity preservation.

[0017] Labeling specifications are nearing completion in both Europe andJapan. Identifying the genetic composition of grain in commercial cropsand maintaining that identity throughout the agricultural complex tosupport labeling has become a high priority for seed companies,commercial growers, distribution and process companies, as well as foodprocessors and is expected to increase as labeling is furtherimplemented in the future. Consequently there is a need to provide aneconomical and efficient way to analyze seeds and crops at variouslocations along the supply chain, to identify and quantify the chemicalcomposition and potentially other measurable properties of one or moreoutput traits in genetically enhanced as well as conventional crops.

[0018] The interest in obtaining detailed analysis of agriculturalproducts extends also into areas involving analysis of other materials.There remains a need as to other materials in providing an economicaland efficient way of analyzing materials on site at remote locations toidentify and quantify their chemical compositions and other propertiesof interest.

BRIEF SUMMARY OF THE INVENTION

[0019] The present invention is directed to a process for identifyingand quantifying one or more properties of interest of a material, theprocess involving providing a material to be analyzed; providing one ormore data acquisition devices capable of acquiring data for predictionof one or more properties of the material; providing a central processorcapable of computing one or more predicted results using multivariatecalibration models and storing a database of multivariate calibrationmodels; providing a communication link between data acquisition devicesand the central processor; and analyzing the material using the dataacquisition devices and the central processor in order to obtain resultson one or more properties of interest. Preferably the central processorstores at least a portion of either measurement data, measurementresults, or both. Preferably the data acquisition devices are capable ofbeing transported from site to site. The calibration model ismultivariate, and compensates for an effectively comprehensive set ofmeasurement conditions and secondary material characteristics.Preferably, the communication link is capable of providing resultantinformation from the central processor to a user interface in thevicinity of the sensor. However, the resultant information may beconveyed by other means, such as by telephone communication, or may beconveyed by the same type of communication link as available betweendata acquisition device and central processor but to a location removedfrom the data acquisition device.

[0020] The invention is also directed to an analysis system comprising acentral processor loaded with one or more calibration models, at leastone data acquisition device connected to the central processor to supplyinput information, and a user interface to initiate data acquisition andto optionally provide an indication of the results generated by thecentral processor. A user interface may receive user input and presentresults to users using any known user-interface technologies, where userinput can be provided by a keyboard, mouse, touchscreen, voicerecognition, dedicated buttons, and the like, and presentation ofresults by a visual display, speech synthesis, printed pages and thelike. A user interface may comprise multiple units or may beincorporated into a single device, including into the same device as adata acquisition device. Furthermore, user input may be received bymultiple devices, and/or multiple devices may provide output to a user.Generally, the central processor includes a data repository to store atleast resultant information, but the repository may also include atleast a portion of the information acquired at the data acquisitiondevice. The invention also encompasses a method for providing analysisof value to a customer.

[0021] Understanding of this subject matter is facilitated through thedefining of certain key terms. The property of interest of a sample ofmaterial being analyzed is referred to herein as the primary variable.This variable is distinguished from other variables which influence theinstrument response, which are identified as secondary variables.

[0022] Analytical instruments do not generate the values of a propertyof interest directly. Rather, measurement signals such as voltage orcurrent are generated by the instruments. These signals may bepre-processed by transforming the raw data using instrumentaltransduction and computation steps performed by internal electronics,computer circuit boards, and the like to a form more readily assimilatedby subsequent processing steps. These optionally pre-processedmeasurement signals are called the instrument output or the instrumentresponse. An instrument is unable to analyze a sample of a materialwithout the creation of a statistically valid relationship between theinstrument response and known values for the property of interest of thematerial. The process of developing such a relationship is known ascalibration.

[0023] The collection of all variations in secondary variables resultingfrom variations in the performance of one or more instruments is theinstrument variance. Instrument variance encompasses variations both inthe instrument components and in the assembly of the components, andencompasses variations which develop over time, but does not encompassoperator variation in the use of the instrument, or other factorsdistinct from the instrument hardware which affect the observed value.

[0024] The actual mathematical relationship between the instrumentresponse and the property of interest of a material is called thecalibration model. To develop the model, the particular analyticalinstrument, employing a particular analytical method, is trained tomeasure a particular property of interest through development of amathematical relationship between the instrument response and knownvalues of the property. Experimental data related to the property ofinterest are generated by recording values of the property of interestdetermined by a reliable independent method on a particular group ofsamples. These recorded values are called known values. It is recognizedthat the experimentally determined values of the known data arecharacterized by experimental uncertainties, so the known values are notknown exactly.

[0025] The group, or set, of samples of a material with known values ofthe property of interest used to develop this calibration model iscalled a calibration set. Variations in the material characteristicsthat are expected to be present in the population of samples that willbe analyzed in the future should be represented by samples in thecalibration set. This calibration set, or more typically a subsetthereof, is then used to generate a collection of instrument responsesover a range of measurement conditions for evaluating the property ofinterest. The collection of known values and instrument responsesgenerated from the calibration set over a range of measurementconditions is a data set called the training set. Because each sample inthe calibration set may be subjected to a range of conditions involvinga number of secondary variables, as well as repeated measurements underthe same conditions, the training set may contain more values than thenumber of samples comprising the calibration set. A training setgenerally encompasses both variations in a range of materialcharacteristics and variations in a range of measurement conditions thatare expected to be present during actual on-site analyses in the future.

[0026] As used herein, secondary material characteristics are materialcharacteristics other than the property of interest that may influencethe instrument response. The primary material characteristic is theproperty of interest. The collection of all variations in secondaryvariables resulting from variations in the secondary materialcharacteristics of one or more samples of a material is the samplevariance. Sample variance encompasses variations both within samples andbetween samples of a material. Variations in the primary materialcharacteristic are encompassed by particular values in the training setspanning the expected range of the property of interest.

[0027] A validation set is another data set of known values andinstrument responses generated from the calibration set, or moretypically a subset thereof, or from a new set of samples of the samematerial with known values over a range of measurement conditions, suchthat this data set is usually distinct from the training set. Thevalidation set is used to test the predictability of the calibrationmodel that was developed using the training set. As used herein, aninstrument-response set is defined as a data set that can be used aseither a training set or a validation set.

[0028] The known values corresponding to the property of interest of thesamples used to generate an instrument-response set may be determined bymeasurements using a validated analytical technique herein referred toas a primary analytical method. These known values are considered to beobserved values that are suitable for use as reference data indeveloping a calibration model for a secondary analytical method. Asused herein, the known or observed values corresponding to the propertyof interest for a set of calibration samples will be treated in the samemanner regardless of whether the values were those of referencestandards or the values were measured by a primary analytical method.

[0029] A remediation update is a new calibration model that is developedby generating new instrument responses for a previous training setwithout necessarily modifying the calibration set or the particularlevels of measurement conditions of the previous training set in orderto re-establish previously attained calibration statistics. Remediationupdates are often needed to compensate for changes in instrumentvariance over time. Since the usable lifetime of calibration samples maybe shorter than the time interval between recalibrations, a remediationupdate may be generated from newly prepared samples of the same materialcovering a similar range of material properties as those of a previouscalibration set.

[0030] In comparison, an enhancement update is a new calibration modelthat is developed from a training set which has been altered from aprevious training set for the purpose of improving some predictioncapability of a model. An enhancement update may be developed fromcombinations of an extended training set that includes additionalcalibration data to span a wider range of either sample characteristicsor measurement conditions, or both; a corrected training set thatincludes modified calibration data or excludes erroneous calibrationdata to correct errors discovered in a previous training set; or animproved training set that includes new calibration data, some of whichmay replace corresponding values in a training set, where the new valuesmay result from improvements in a primary analytical method used togenerate better estimates of known values or from an improvedrepresentative sampling of the material. An improved training set mayalso be developed by selecting a different set of calibration samples ormeasurement conditions for the new training set, where the number ofobservations may be greater than, the same as, or less than that used ina previous training set.

[0031] A global training set is a training set configured to compensatefor an effectively comprehensive range of variation in the secondaryvariables in connection with predicting the property of interest from acalibration model.

[0032] The term compensation as used herein is defined as the reductionor elimination of the impact of variation in one or more factors on thepredicted result.

[0033] If the output of an analytical instrument depends only on theproperty of interest, a univariate calibration model would be created.Generally, a univariate calibration model is rarely developed because anumber of additional variables are usually encountered which affect theinstrument response. Such other variables may include impurities in thesample and the temperature of the sample. Where more than one variableaffects the instrument response, the calibration model generated tointerpret the instrument response is considered multivariate. Thus, amultivariate model having n variables of the instrument response R isdefined mathematically as

R=f(x ₁ , x ₂ , . . . , x _(n)),

[0034] where x₁ is the property of interest and x₂ to x_(n) are theadditional n−1 variables that affect the instrument response R.

[0035] The objective of calibration model development is to predict theinstrument response from known values of the property of interest in thepresence of variations of the secondary variables. A training set forthe calibration model is developed by generating instrument responses atdifferent levels spanning the expected ranges, or a portion of theseranges, of the primary and secondary variables. In the development of acalibration model, the instrument response is a dependent variable, andthe primary and secondary variables are identified as independent. Thedependent variable is the presumed effect or response to a change in theindependent variables. Independent variables are also known as predictorvariables.

[0036] After the calibration model is developed, it is rearranged toexpress the property of interest as a function of the instrumentresponse and the secondary variables. The measurement process predicts,or measures, the property of interest by using the analytical instrumentresponse generated for an unknown sample as input to the calibrationmodel. The primary objective of calibration model use is to predict theproperty of interest from the instrument response in the presence ofvariations of the secondary variables. In actual situations where thecalibration model is multivariate, the relationship of the property ofinterest to the instrument response is affected by the presence ofsecondary variables, which include all other factors that willsignificantly influence the instrument response. These secondaryvariables may be described alternatively as influential factors,interfering factors, or contaminating factors in the measurementprocess.

[0037] The quality of a calibration model can be described bycalculating the degree of correlation between the known values and thecorresponding predicted results using a training set or a validationset. One such process is called cross-validation, wherein the sameobservations of a training set are used for two different purposes,model building and validation. As used herein, an observation is thedata corresponding to a single measurement process. One version ofcross-validation is known as the leave-one-out method, involving atraining set of M observations with repetition of building thecalibration model M times. Each time, one observation (the i^(th)) isexcluded and the remaining M−1 observations are used to build the i^(th)calibration model, where i ranges from 1 to M. The excluded observationis then used to validate the i^(th) model, whereby the i^(th) residualis computed as the difference between the i^(th) predicted value andi^(th) known value.

[0038] The statistical measure of the degree of correlation between theknown values and the corresponding predicted results of the data set isthe square of the correlation coefficient, known as r². The square ofthe correlation coefficient is also known as the coefficient ofdetermination, which measures the fraction of the variation in thedependent variable about its mean that is explained by variation in theindependent or predictor variables. The total variation in a set ofpredicted values is the sum of two parts, the part that can be explainedby the model and the part that cannot be explained by the model. Theratio of the explained variation to the total variation, or r², is ameasure of how good the model is. The r² statistic is therefore ameasure of the strength of the relationship between the observed andpredicted values. The values of r² range from 0 to 1, and cover therange of no correlation up to perfect correlation. In practice, thesetwo extremes are rarely if ever encountered. High values of r² indicatethat the model tends to determine its predictions with small errors. R²is defined as 100 times r², so the values of R² range from 0 to 100.Additional information on these and related statistical terms can befound in “Multivariate Calibration”, 1989, John Wiley & Sons, Ltd. byHarold Martens and Tormod Naes; and “Chemometric Techniques forQuantitative Analysis”, 1998, Marcel Dekker, Inc. by Richard Kramer,both texts of which are incorporated herein by reference.

[0039] Samples with known values of the property of interest are calledknown samples. These samples are distinct from other samples calledunknown samples, for which values of the property of interest may not beknown.

[0040] It is possible that the measurement of an unknown sample maygenerate a value which is not a valid prediction by the existingcalibration model. This invalid predicted value is called an outlier. Anoutlier may be an observed measurement which deviates so much from otherobservations as to arouse a suspicion that the sample was taken from apopulation distinct from that used to create the calibration model.Alternatively, an outlier may be a false positive observation which doesnot deviate noticeably from valid observations but belongs to anoverlapping, contaminating population. An outlier is always a value thatis distinct from a basic population of valid predictions of a propertyof interest. Generally it is expected that large gaps would be notedbetween observations of outliers relative to observations of acceptablevalues which fall within the basic population. However, the distinctionbetween acceptable and outlier observations is not always clear-cutbecause contaminating distributions can overlap the basic distribution.In practice then, it is expected that not all outliers will beidentified. The determination of an outlier is done with a statisticalprobability rather than with certainty.

[0041] Some outliers may be caused by invalid measurements, such as whenan instrument malfunctions to produce an abnormal spectrum or when anincorrect type or insufficient quantity of sample is measured. Otheroutliers may result from the inadvertent use of erroneous values ofknown data in a training set or a validation set. Outliers caused byinvalid measurements or erroneous calibration data are called badoutliers and the corresponding predicted values should be considered tobe invalid results. In other situations, outliers may be validmeasurements of samples or measurement conditions which fall outside therange of primary or secondary variables spanned in the training set. Instill other situations, outliers may result from valid measurements inwhich some previously unidentified secondary variable has become aninfluential factor. In these latter two situations, such outliers arecalled good outliers since they identify opportunities to extend thetraining set to cover a wider range of samples and measurementconditions. While the predicted values of good outliers should also beconsidered to be invalid results, if a new calibration model isdeveloped by including a range of such good outliers in the trainingset, future measurements will be capable of generating valid resultsover a wider range of samples and measurement conditions.

[0042] There is no unequivocal test to determine whether an observationis an outlier. The possibility of an observation being an outlier isdetermined by the type of test used to evaluate that observation. In asituation where the distance of an observed data point from a centralmeasure of a population can be an indication of an outlier, the operatorcan employ the Mahalanobis distance to determine the existence of apossible outlier. The Mahalanobis distance is the scalar distancebetween a multivariate observation and the centroid of a multivariatedistribution that takes into account the actual spatial distribution ofvalues in multidimensional space. A typical cutoff point or thresholdvalue for determining if an observation is considered to be a probableoutlier is a Mahalanobis distance that is often found to be in the rangefrom 0.1 to 1. At values lower than the threshold, there is typicallyinsufficient reason to exclude the observation. Such observations arepresumably valid. Depending on the desired probability of detection, thethreshold value of the Mahalanobis distance value which would be anindicator of a probable outlier may be adjusted downwardly or upwardlyto increase or decrease the probability of detection. However,increasing the probability of detecting an outlier also increases theprobability of rejecting valid observations. Alternatively, arecommended threshold value for detecting a probable outlier can becomputed by chemometric software such as OPUS Quant-2 from the observeddistribution of Mahalanobis distances in the training set. The referencemethod used for determining Mahalanobis distance (MAH) was ASTM E1790-96.

[0043] It has been found that different threshold values of theMahalanobis distance can be used to classify probable outliers into asmall number of different categories according to the probable reasonthat the observed value is a probable outlier. Thus, for example,Mahalanobis distances in the range from about 0.4 to about 1.0 oftencorrespond to good outliers, while Mahalanobis distances greater thanabout 1 often correspond to bad outliers. Furthermore, when theMahalanobis distance is considerably greater than 1, for example greaterthan 100, the outlier is extremely bad and often corresponds to a majorinstrument malfunction such as a non-emitting excitation source or thecomplete absence of a sample during data acquisition. When theMahalanobis distance is in the range from about 1 to about 100, the badoutlier often corresponds to smaller problems with the sample, theinstrument, the environment or the sample presentation such as when asample is present during data acquisition but is of a different materialfrom that used to develop the calibration model or when an inadequateamount of sample is detected. Generally, the threshold value for badoutliers is generally at least twice the magnitude of the thresholdvalue for detecting a probable outlier which herein is called thethreshold value of good outliers. The threshold value for extremely badoutliers is generally about 100 times the magnitude of the thresholdvalue for bad outliers.

[0044] At a basic level, the on-site analysis system comprises one ormore data acquisition devices, a central processor, and a communicationlink. The data acquisition devices are used primarily, though notnecessarily exclusively, for data acquisition, and the central processorfor data analysis. A packet of information transmitted from anindividual data acquisition device to the central processor is calledmeasurement data, and the packet of information transmitted from thecentral processor to an output device of a user interface in thevicinity of a data acquisition device or to a third party is calledmeasurement results. The data acquisition devices typically arephysically separated from the central processor, though two or more dataacquisition devices may be at a single location. The data acquisitiondevices may be geographically separated from each other and the centralprocessor by great distances, but this is not required. As used herein,the data acquisition devices broadly identify the group of devices whichacquire information about a sample of a material. Preferably, the dataacquisition devices acquire spectroscopic data, though other analyticmechanisms may be employed, such as via chromatography, massspectrometry or emission detection. Preferably, the data acquisitiondevices are transportable and are capable of generating an instrumentresponse from data acquired on samples of material at a number of remotelocations. The data acquisition device may also include in a single unitthe sample presentation device for providing a sample for dataacquisition by a detector, and a local processor (such as a laptopcomputer) with a user interface for executing the steps necessary togenerate a measurement result, with an optional output device to readout the measurement result.

[0045] The central processor as used herein is a computer system whichstores one or more calibration models and manipulates data transmittedfrom one or more data acquisition devices using the calibration modelsto predict values for the property of interest of a material. Thecentral processor is not necessarily a single entity, however, since itmay reside on multiple computer servers or clustered servers, where someduplication may be provided for redundancy and other duplication may beprovided to mirror servers in multiple geographic locations. The use ofmultiple servers also increases the processing capacity, i.e., thenumber of transactions which can be completed within a period of time.In practice, the central processor behaves as if it were a single entityat a central location. The group of redundant and mirrored processors isknown herein as the central processor. Further, the database ofcalibration models stored in the central processor is preferablyconstructed to compensate for expected variations in an effectivelycomprehensive set of secondary variables to provide statisticallyequivalent results from different remote instruments over time withoutinstrument-specific calibration transfers or remediation updates.

[0046] An advantage of the analysis system is the relationship of themultiple data acquisition devices to the central processor. The systemcan function with many data acquisition devices located at sites whichmay be far removed from each other geographically for measuring theproperties of a single material, such as an agricultural or mineral oreproduct. However, the system can also encompass multiple dataacquisition devices in the same room or building. The use of a centralprocessor means that all data from each data acquisition device arebeing manipulated in the same way for predicting a property from aparticular model.

[0047] The database or library of calibration models stored in thecentral processor can be modified as desired to provide enhancementupdates, add models to expand the capabilities for analyzing newproperties, or to delete models that are no longer needed. Allmodifications to the database of calibration models can be done withoutmaking any changes to the hardware or software of individual dataacquisition devices. Thus, enhancement updates and new models installedin the central processor can be used to analyze measurement data fromall data acquisition devices immediately after installation without theneed to separately undertake any remedial action at each of the remotesites, and the measurement results from all data acquisition devicestypically are stored in the central processor for subsequent reportingand data analysis. As discussed in more detail below, the influentialvariations which exist in the test environment, the sample presentationto the data acquisition device, the physical and chemicalcharacteristics of the sample, or the individual data acquisitiondevices themselves are able to be taken into account in quantifying theparticular property or identifying the particular substance beingmeasured. Thus, for example, the quantified measurement of theconcentration of a triglyceride in an oilseed in one part of the worldcan be directly compared with a triglyceride concentration of an oilseedat a different location. Even where only one data acquisition device isconnected to the central processor, the improved calibration modelpermits analysis to be conducted over time without the need forremedially updating the model.

[0048] The central processor stores a database of calibration models;receives a plurality of data values from a single measurement process,these data acquired by data acquisition devices, typicallyspectroscopic, about the particular material of interest; computesvalues for one or more properties of interest of the material usingalgorithms or computational procedures to manipulate the data andgenerate predicted values from the calibration models; and forwards theresultant information which it has generated. Optionally, the centralprocessor sends results back to a user interface of the individual dataacquisition devices. The data acquisition devices and central processorpreferably transmit information to each other over a communication link,though it is possible for information to be transmitted from the dataacquisition device to the central processor using a differentcommunication link than that used for transmitting results. Presently,it is preferred that the information be transmitted in digital form. Thecommunication link broadly is one or more communication pathways, oftenin a communication network, and may include various combinations of, forexample, hard-wired telephone lines, cables, optical fibers, a system oftowers or satellites for wireless communication, radio equipment, orcombinations sufficient to transmit a signal carrying the desiredinformation between any location and a central processor.

[0049] The calibration model based on chemometric methods ofmultivariate analysis provides the capability for generating usefulmeasurement information even where more than one secondary variableencountered during the data acquisition step of a single measurementprocess may vary simultaneously, independently, or both.

[0050] Traditionally, calibration modeling has focused on variations inthe sample and sometimes on one or more secondary variables, but not aneffectively comprehensive set of factors in constructing the model. Theinstant invention not only recognizes the effect of sample variation inconnection with developing a calibration model, but also evaluates andcompensates for effects due to variation in the environment, theinstrumentation, and the sample presentation.

[0051] After the potential universe of factors which may affect thecharacterization of the sample has been identified, it is determinedwhether one or more factors can be eliminated from consideration bymathematical methods of data pretreatment, whether certain factors mustnecessarily be considered in connection with developing a training setfor the calibration model, or whether certain factors must be controlledin a manner as might be done in a traditional methodology to reduce oreliminate variation during the measurement process. The invention thustakes into account all factors known to a reasonable analyst forcharacterizing the sample as well as other factors that have not beenrecognized previously, and then proceeds to minimize the effect of allinfluential factors to develop an acceptable revision of the calibrationmodel. The inventive method thus initially takes into account a numberof factors and variations within each such factor sufficiently wide tospan the expected range of variations, and then attempts to compensatefor these factors in the process of ultimately creating an acceptablecalibration model which contains substantial improvements inpredictability and which eliminates or substantially reduces effectssuch as from instrument drift over time relative to existing calibrationmodeling techniques.

[0052] One method for developing the calibration model involvesproceeding in a stepwise fashion, initially compensating for a singlesecondary variable, comparing the data generated by the thus-createdcalibration model with data from a reference method, followed thereafterby compensating for a second secondary variable and determining if thecorrelation improves or if the predictability is acceptable according tosome statistical criteria, and repeating this process until anacceptable prediction level is achieved in the presence of variations ofall known influential factors. This stepwise process is useful in thefeasibility stage to identify influential factors and developappropriate compensation techniques. Using the extended training setdeveloped in this manner, it becomes more efficient to developsubsequent model revisions by including all factors at one time.Alternatively, the calibration model is initially developed bycompensating for, or by otherwise taking into account, a number of knownrelevant factors at one time. Under these circumstances it is possiblethat the calibration model will still need to be refined one or moretimes before an acceptable model exists. However, the number ofrefinement operations will typically be lower than the numberencountered when only one variable is evaluated at a time. In bothcases, the goal is to identify all statistically significant factorswhich might arise from variations in the sample, the environment,instrumentation, and sample presentation; eliminate those factors fromconsideration which respond to data pretreatment, or satisfactorilyreduce their effect; control those factors which may produce greaterinstrument responses than those from variations in the property ofinterest; and incorporate into the training set of the calibration modelrepresentative data spanning a range of variations for those remainingfactors which affect the property of interest.

[0053] An example of an environmental variation is temperature changewhich results in samples having different temperatures at differenttimes of measurement. This variation may affect the predictability of acalibration model generated where there has been no compensation fortemperature, or if the measurement occurred at a temperature outside therange of temperatures represented by data in the training set. A numberof other variations which may affect the calibration model can beintroduced via the measuring instrument. The nature and number of thesevariations are a function of the type of instrument employed. Forexample, the incident radiation directed toward a sample using anear-infrared spectrometer has been found to vary as a function of theorientation of the source filament relative to the sample. Also, anolder lamp generally does not provide the same light intensity as a newlamp. Concerning the sample presentation variations, the rate at which asample passes through an incident light beam during a measurementprocess, or differences in the amount of sample in different measurementprocesses, may affect the predictability of the calibration model.

[0054] It should be recognized that the multivariate calibration modelsof the invention can be applied to any secondary measurement techniquein which a data acquisition device generates multiple data values ratherthan a single numeric value for each measurement. The plurality of datavalues may be acquired as discrete values in a digital device, or theymay be acquired in continuous fashion from an analog device and thenconverted into digital format. For example, a spectrum consists of amultitude of intensity values over a range of wavelengths. The spectraldata may be acquired continuously or digitally. If the storage locationof a single data value is referred to as a data channel, then amultichannel instrument is one for which acquisition of multiple datavalues may be considered to occur by storing the individual data valuesin a multitude of individual data channels. Thus, as used herein, aspectrometer that generates a continuous spectrum is a multichannelinstrument, since the spectrum could be digitized and the multitude ofdiscrete data values resulting from digitization could be stored in amultitude of individual data channels and could not be stored as asingle data value in a single data channel. Hence, the invention appliesgenerally to secondary measurements from multichannel instruments, wherea secondary measurement is the prediction of a result from amultivariate calibration model of an analytical method. As discussedbelow, though the invention will be described in the context of NIRspectroscopy, it should be recognized that the invention can bepracticed for any analytical method based on a multichannel instrument.

[0055] The samples that will be analyzed at remote locations may not bepure single molecular species, and thus will typically contain severalcomponents of which some may be contaminants and others may bedistributions of molecular species as occur in synthetic polymers andnatural products. Analysis of multichannel data on impure ormulti-component samples can contain peaks showing considerable overlap.Chemometric methods of multivariate analysis such as partial leastsquares (PLS), principal component regression (PCR), artificial neuralnetworks (ANN), and the like allow for determining the properties ofinterest of multiple components in each sample simultaneously. While thepresent invention will be described in terms of PLS, it should beunderstood that other chemometric methods of multivariate analysis canbe used to construct calibration models. Multivariate calibrations makeuse of not just a single data point, but take into account data featuresover a range of data values, so analysis of overlapping bands or broadpeaks becomes feasible.

[0056] PLS is one of a number of factor-based methods of multivariateanalysis, where a factor space is an alternative coordinate system for adata set, a factor is an axis of the alternative coordinate system orfactor space, and the dimensionality of a factor space is the number ofaxes or factors in the factor space. In a factor-based method ofmultivariate analysis, the axes of a factor space are selected to mostefficiently span the data values in a manner that will capture as muchas possible of the systematic variation in the data along a subset ofaxes or factors.

[0057] The total variation in a set of multivariate data is composed oftwo parts, the part caused by systematic variations in the primary andsecondary variables and the part caused by random variation orexperimental noise. It is usually found that some factors in a factorspace contain only or mostly experimental noise, and such factors aretherefore not related significantly to variations in the primary andsecondary variables. Thus, it becomes possible to reduce the number ofaxes in a factor space relative to the number of axes in the originalcoordinate system of the data by omitting factors or axes that containonly or mostly experimental noise. The dimensionality of the factorspace that is sufficient for predicting results to an acceptable levelof precision is therefore generally smaller than that of the originalcoordinate system. The number of factors remaining after such areduction in dimensionality is called the rank of the factor space. Therank characterizes the dimensionality of the fit of a calibration modelto the data. It is generally preferred to avoid overfitting a model byselecting a smaller rank when possible.

[0058] The invention is also directed to a method which can support theanalysis needs of customers, particularly where the materials to beanalyzed are in locations geographically removed from each other and theoperators are not skilled in analytical methodology. The method may alsosupport the customer's transactions, the underlying contract agreements,material rating and billing functions, and further data analysis ofmaterial. The method utilizes the analysis method and system hardwaredescribed herein, encompassing a collection of processing,infrastructure and software components that support multiple applicationmodels that involve collection of data, transmittal of that data over acommunication link, analysis of that data by appropriate softwareapplications to derive value from the data, storage of the analyzed datain a repository for generating historical statistics, identitypreservation and tracking, auditing, forecasting and model improvement,and delivery of results back to the original submitter or an alternatelocation over the same or different communication link.

[0059] It is therefore an object of the invention to provide acalibration model for a property of interest which can accommodate aneffectively comprehensive range of variations in one or morecharacteristics of the material to be analyzed, the instrument, theenvironment, and the sample presentation without need for remedial modelupdating.

[0060] It is a further object of the invention to provide an analysissystem which utilizes calibration models capable of compensating for arange of secondary variables.

[0061] It is a further object of the invention to provide an analysissystem which provides for at least one data acquisition device and acentral processor in combination with a calibration model algorithm tobe able to accommodate an effectively comprehensive range of variationsof primary and secondary variables, which can generate analysis data onthe material to be analyzed.

[0062] It is a further object of the invention to provide an analysissystem which provides for multiple remote data acquisition devices and acentral processor which can generate analysis data on multiple samplesof materials remote from each other but which are each analyzed usingthe same calibration model algorithm for the particular property beingmeasured.

[0063] It is a further object of the invention to provide an analysissystem which incorporates a user interface in combination with the dataacquisition device to provide analysis information generated by thecentral processor for a particular sample being measured at the locationwhere the measurement is taken.

[0064] It is a further object of the invention to provide a method ofanalysis which permits the measurement of one or more properties ofmaterials utilizing a single calibration model algorithm for eachproperty at a central processor.

[0065] It is a further objective of the invention to provide a method ofanalysis incorporating a calibration model algorithm that has beenconstructed to compensate for an effectively comprehensive set ofexpected variations in sample, sample presentation, environmentalconditions and instrument.

[0066] It is a further object of the invention to provide a method ofanalysis which permits the measurement using multiple instruments of oneor more properties of materials located in remote locations utilizing asingle calibration model algorithm without incorporatinginstrument-specific parameters in the algorithm.

[0067] It is a further object of the invention to provide a method foranalyzing materials at remote locations, e.g., for multiple customers,individuals, entities, or the like.

[0068] These and other objects and advantages of the invention areprovided in the detailed description of the invention and in thedrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

[0069]FIG. 1 is a block diagram of an on-site analytical system.

[0070]FIG. 2 is a block diagram of components and informationtransmission within a data acquisition device.

[0071]FIG. 3 is a block diagram of system architecture.

[0072]FIG. 4 is a flowchart to establish model feasibility.

[0073]FIG. 5 is a flowchart of model development and use.

[0074]FIG. 6 is a flowchart of filter refinement.

[0075]FIG. 7 is superposed NIR spectra of six samples of squalane insqualene at different concentrations.

[0076]FIG. 8 is predicted versus observed values from thecross-validation of Model 1.0.

[0077]FIG. 9 is predicted versus observed values from thecross-validation of Model 1.1.

[0078]FIG. 10 is superposed NIR spectra for different environmentallight intensities.

[0079]FIG. 11 is superposed NIR spectra for different sample caporientations.

[0080]FIG. 12 is predicted versus observed values from the validation ofModel 1.1 for different sample cap orientations.

[0081]FIG. 13 is predicted versus observed values from thecross-validation of Model 3.0.

[0082]FIG. 14 is predicted versus observed values from the validation ofModel 3.0 for different sample cap orientations.

[0083]FIG. 15 is superposed NIR spectra for different samplepathlengths.

[0084]FIG. 16 is superposed transformed spectra after vectornormalization pretreatment of the FIG. 15 NIR spectra.

[0085]FIG. 17 is predicted versus observed values from thecross-validation of Model 4.0.

[0086]FIG. 18 is predicted versus observed values from thecross-validation of Model 4.1.

[0087]FIG. 19 is predicted versus observed values from thecross-validation of Model 5.0.

[0088]FIG. 20 is background spectra at different humidities.

[0089]FIG. 21 is superposed NIR spectra of vapor phase water atdifferent generated levels of humidity.

[0090]FIG. 22 is superposed NIR spectra of 1.00% squalane in squalene atdifferent generated levels of humidity.

[0091]FIG. 23 is superposed NIR background and absorbance spectra of1.00% squalane in squalene.

[0092]FIG. 24 is superposed transformed spectra after first derivativeand vector normalization pretreatment of the FIG. 23 NIR spectra.

[0093]FIG. 25 is predicted versus observed values from thecross-validation of Model 7.0.

[0094]FIG. 26 is superposed NIR spectra of 2.00% squalane in squaleneusing different fiber optic probes.

[0095]FIG. 27 is superposed transformed spectra after first derivativeand vector normalization pretreatment of the FIG. 26 NIR spectra.

[0096]FIG. 28 is predicted versus observed values from the validation ofModel 7.0 for different instruments.

[0097]FIG. 29 is predicted versus observed values from thecross-validation of Model 9.0.

[0098]FIG. 30 is predicted versus observed values from the validation ofModel 9.0 for Instrument B.

[0099]FIG. 31 is predicted versus observed values from thecross-validation of Model 10.0.

[0100]FIG. 32 is predicted versus observed values from the validation ofModel 10.0 for Instrument B.

[0101]FIG. 33 is a schematic diagram of a flow-through samplepresentation system attached to a FT-NIR spectrometer.

[0102]FIG. 34 is superposed NIR spectra of representative seed samplesof a variety of canola.

[0103]FIG. 35 is predicted versus observed values from thecross-validation of Model 11.0.

[0104]FIG. 36 is predicted versus observed values from thecross-validation of Model 11.1.

[0105]FIG. 37 is predicted versus observed values from thecross-validation of Model 11.2.

[0106]FIG. 38 is an abnormal NIR spectrum resulting from a failedexcitation source.

[0107]FIG. 39 is superposed NIR spectra of the same sample of canolaseed taken using different sampling amounts.

[0108]FIG. 40 is a NIR spectrum of wheat.

[0109]FIG. 41 is superposed NIR spectra of different samples of a singlevariety of canola seed different from the canola variety of FIG. 34.

DETAILED DESCRIPTION OF THE INVENTION

[0110] The invention is directed in part to a method of characterizing amaterial for at least one property of interest comprising acquiringmultichannel data on at least one multichannel signal from a sample ofthe material using a data acquisition device at a location, transmittingthe multichannel data along a communication link to a central processorwherein the central processor includes at least one algorithm formanipulating the multichannel data and evaluating at least onemultivariate calibration model which can accommodate a range ofvariables distinct from the variation in the property of interest of thesample, manipulating the multichannel data by the central processor togenerate a result predictive of the property of interest of the sample,and forwarding the result from the central processor. The result istypically forwarded to one or more locations remote from the centralprocessor. Preferably, the calibration model can compensate for at leastinstrument variance. Preferably, the result is transmitted to auser-interface output device in the vicinity of the data acquisitiondevice along the same communication link used for connecting the dataacquisition device with the central processor. The result, however, canbe forwarded to a different location utilizing the same or a differentcommunication link as desired. Various user interfaces can be providedat one or more locations to enable a user to communicate with thecentral processor, a data acquisition device, or both. The method ofanalyzing can encompass a single data acquisition device in a locationphysically remote from the central processor, or can alternativelyinclude a plurality of data acquisition devices which communicate withthe central processor by a communication link. The calibration modelloaded onto the central processor is multivariate, and preferablycompensates for variations in an effectively comprehensive set of actualmeasurement conditions and secondary material characteristics.

[0111] In generating this calibration model the secondary variables orinfluential factors that can significantly impact an instrument'sresponse are identified experimentally. These factors have the potentialto influence the results predicted from the model. Generally, it is notpossible to determine a priori if a particular variable will influencethe results predicted from the model. Thus, these variables aredescribed as potentially influential factors until their status isvalidated experimentally. It is not necessary to determine the actualphysical or chemical cause of a potentially influential factor toidentify such a factor experimentally or to generate calibration dataresulting from systematic or random variations in the level of such afactor. The modeling process utilized herein accounts for afundamentally wider range of potentially influential factors, whileaccommodating for this range of factors in a manner which substantiallydecreases the amount of time required to maintain the model since theneed to develop remedial updates is avoided.

[0112] The invention also relates to the analytical system comprised ofat least one data acquisition device in a remote location, and a centralprocessor loaded with one or more calibration model algorithms andconnectable by a communication link to the data acquisition deviceduring the measurement process. Preferably, the system also comprises auser-interface that is connectable to the central processor over acommunication link and an output device connected to the user interfaceto receive a result generated by the central processor. Preferably, theoutput device is located in the vicinity of the detector. Further, theoutput device preferably is connected to the central processor by thesame communication link connecting the data acquisition device and thecentral processor.

[0113] It is known to employ multivariate calibration modeling inconnection with analytical techniques to predict a result. Generally,these calibration models were generated by first determining theproperty of interest and focusing on variables relating to variation ofmaterial characteristics in the population of samples. Generally,attempts were made to eliminate the effects due to other influentialvariables by controlling those parameters which might vary. Thecalibration model generated in this way might maintain an acceptabledegree of predictability for a period of time, but eventually wouldreach a point when the accuracy of the predicted values wasunacceptable. At that time, the calibration model would have to beregenerated, or the instrument re-calibrated using a remediation updateor adjustments to the instrument hardware.

[0114] The multivariate calibration model disclosed herein considersactual measurement conditions as they are expected to occur duringfuture measurements and identifies an effectively comprehensive set offactors or experimental variables which can significantly impact theinstrument response. Broadly, the multivariate calibration modelemployed in the invention is developed by identifying potentiallyinfluential factors which may significantly impact the response of theinstrument. Experiments are then run to determine whether thepotentially influential factors are indeed influential by recordinginformation generated by the instrument, such as a spectrum, atdifferent levels of individual factors or combinations of factors. Ifthis information at different levels of factors shows no significantdifferences, then these factors are not influential in connection withpredicting a value for the property of interest.

[0115] The on-site analytical system can be broadly depicted as shown inFIG. 1. Sensors 2, 4, 6 are examples of data acquisition devicesconnectable bi-directionally through communication link 8 which in turnis bi-directionally linked through the central processor 10 to passinformation. Information traveling from the sensors 2, 4, 6 through thecommunication link 8 to the central processor 10 is generally identifiedas measurement data 12. Information transmitted from the centralprocessor 10 through the communication link 8 to user interfaces (notshown) in the vicinity of one or more sensors 2, 4, 6 is generallyidentified as measurement results 14. Other information such asannouncements, status indicators, user preferences, requests for billinginformation, system usage information and the like may also betransmitted to and from the central processor 10.

[0116] It has been found that utilization of a subset of acquired datafrom a multichannel instrument, such as a portion or subregion of theavailable spectrum for a NIR instrument, may generate predicted resultsthat compensate for variations in a wider range of influential factorscompared with the results generated from the entire set of multichanneldata acquired from a single measurement process for a particularproperty of interest. A single measurement process may generate a set ofmultichannel data from the accumulation of multichannel data fromindividual repetitive runs, and the accumulated data may also bepreprocessed. If one or more factors are determined to be influential byexperimentation as described above, the instrument response, such asspectral data, may be pre-treated utilizing one or more of severalmathematical operations. Pretreatment as used herein encompasses datatransformation prior to model prediction. Pretreatment may be used tosimplify the data, reduce experimental noise, and eliminate or reduceeffects from some secondary variables by mathematical operations suchas, but not limited to, filtering the data to reduce its size to one ormore smaller subregions of interest and eliminate data fromnon-interesting or non-beneficial subregions, and applying one or moremathematical transformations such as, but not limited to, weighting,multiplicative scatter correction (MSC), weighted MSC baselinecorrections, derivative operations, vector normalization, and standardnormal variate and detrend (SNVD).

[0117] Pretreatment is performed in connection with the process ofdeveloping and using a calibration model, and may be performed before orpreferably after transmission of acquired data to the central processor10. In addition, if the instrument generates a spectrum, pretreatmentgenerally includes filtering a spectrum by eliminating unwanted data atselected wavelengths or wave numbers for a particular property model.Each spectrum forwarded from the sensors 2, 4, 6 preferably retains theentire available spectral region. Later manipulation of the spectrum atthe central processor 10 can then utilize various subregions asnecessary to generate the predicted result for one or more properties ofinterest. In some cases, pretreatment is observed to eliminatedifferences between multichannel data sets that had been present inspecified spectral subregions before pretreatment while maintainingsufficient information to permit prediction to a desired statisticallevel. Pretreatment may also be used with acquired data in forms otherthan as spectra. When elimination or minimization of differences in thetransformed data is observed after pretreatment, the potential influenceof these factors on the predicted result is eliminated or effectivelyreduced by the pretreatment operation. In the case of multichannel datasuch as spectra, filtering combined with other mathematicaltransformations of the data in appropriate spectral subregions may beable to compensate for variations in the level of one or more factors byeliminating these factors as variables of the calibration model.

[0118] If pretreatment does not eliminate or effectively reduce thespectral differences arising from variations in one or more factors,then the training set is expanded to include spectra corresponding todifferent levels of those factors, depending on the desired level ofprecision for the predicted result. In this instance, the properlyconfigured expanded training set will enable a model to be developedthat will compensate for variation in the level of the factors byincluding data in the training set relating to the dependence of thepredicted result on the factor. Significantly, it is not necessary toquantify each factor being considered. Rather, a relative orsemi-quantitative assessment may be made to ensure that the variation inthese factors span the range from hottest to coldest, highest to lowest,and the like. The analyst needs only to include measurements taken atdifferent levels of each factor over an expected range, where theexpected range encompasses levels only in that part of the entirepossible range of a factor that are expected to occur during actualmeasurements in the future. For example, if a particular sample wasbeing presented to a detector and a spectrum was generated, theorientation of the sample container at the time of data acquisition bythe detector might be considered a potentially influential factor in thefinal instrument response. Assume that different orientations of thesample container at the time of analysis will create spectraldifferences which either must be eliminated by pretreatment or must becompensated by extending the training set. To effect compensation, it isnot necessary, and occasionally it is not even possible, to determine orestablish a quantitative measure of the exact orientation of the samplecontainer at the time when the detector is used to acquire data from thesample. Under the modeling procedure utilized herein, it is onlynecessary to evaluate the sample container orientation at various randompositions which would be expected to span the range of orientationsduring future measurements.

[0119] Generally, the number of levels of a primary or secondaryvariable which should be selected to span the expected range ofvariation should correspond at least to two plus the expected complexityof a polynomial approximation of the assumed relationship betweenobserved and predicted values. Thus an assumed, approximately quadraticrelationship should have four levels of the factor at a minimum.

[0120] As used herein, a property model is an algorithm (orcomputational procedure) for generating the predicted value of aproperty of interest for pre-processed data developed from a trainingset. The property model algorithm is the combination of pretreatment ofpre-processed data, followed by evaluation of a calibration model togenerate a predicted value from the pretreated, pre-processed data. Theproperty model may be evaluated according to a single mathematicalrelationship between the instrument response and the property ofinterest spanning the entire region of the multichannel data, orpiecewise from two or more mathematical relationships in varioussubregions of the multichannel data. In all cases, the algorithmgenerates a single predicted result that does not depend on knowledge ofthe particular data acquisition device providing the multichannel data.A global property model is a property model developed from a globaltraining set. Note that a global training set of a property model needsto include data spanning the expected range of only those influentialfactors that have not been compensated by the pretreatment operations ofthe property model.

[0121] A block diagram showing additional detail of the components andinformation transmission flow within a data acquisition device is shownin FIG. 2. Sample 20 is deposited into or flows through the samplechamber 24 of the sample presentation device 22. If required by theanalytical method, an excitation source 26 is utilized to supplyradiation or other form of energy 40 to the sample 20 in the samplechamber 24, the resultant radiation or non-stimulated emission being rawdata 42 received by the detector 28. The detector 28 and, if present,the excitation source 26 are located within the analytical instrument30, which also includes associated components required to generate aninstrument response. Information from the analytical instrument 30 isthen transferred to the local processor 34 with user interface 32 in theform of digitized data 44 where pre-processing steps may be performed.The user interface 32 passes measurement data 12 to the centralprocessor 10 (not shown) via communication link 8. Measurement results14 are received from the communication link 8 back to the localprocessor 34, wherein optionally the resultant information is presentedat an output device (not shown) of the user interface 32.

[0122] Pre-processing as used herein encompasses the transformation ofraw data prior to communication over a communication link 8 usinginstrumental transduction and computation steps to a form more readilyassimilated by the central processor 10. Some pre-processing steps maybe performed by the analytical instrument 30 and other pre-processingsteps may be performed by the local processor 34. Pre-processing isindependent of the process of developing or using a calibration model.

[0123] Generally, the generation of predicted values is initiated at thesensor 2 as an example of a data acquisition device. Thebi-directionality providable through the communication link 8 can permitthe central processor 10 to submit a reminder or other trigger signal tothe user interface 32 to initiate the acquisition of data by sending aninstrument control signal 46 to the analytical instrument 30. Forexample, detection of a probable outlier in some measurement results 14at the central processor 10 can cause the central processor 10 toforward a signal to a user interface 32 of the local processor 34 of aneed for additional measurement data 12 on the same sample or a newsample. Alternatively, the scan of a sample may desirably besemi-automated so that additional scans are initiated at regular timeperiods after an initial scan. This prompt can originate at the userinterface 32, or it can originate at the central processor 10 which inturn can forward the prompt to the user interface 32.

[0124] Specific combinations of analysis system components have beendescribed. However, it is anticipated that modifications to thesecombinations will also perform satisfactorily, and are considered to bepart of the invention. As an example, the sensor 2 has been described asincluding a sample presentation device component 22, an analyticalinstrument component 30 consisting of an excitation source (or otherradiation or energy-generating unit) 26 and a detector 28, and a localprocessor 34 with user interface 32. The sensor 2 initiates dataacquisition on the sample 20 in the sample presentation device 22 at thecommand of the user interface 32, and transmits the acquired measurementdata 12 via a communication link 8 to the central processor 10.

[0125] It can be appreciated that the sensor 2 does not need to containthe sample presentation device 22, analytical instrument 30, and localprocessor 34 in a single enclosure for the system to be able to operate.The sample presentation device 22, analytical instrument 30, and localprocessor 34 can each be stand-alone units if this becomes necessary ordesirable, and combinations of these components can be assembled. Thedesired method operations occurring in the vicinity of the sensor 2include: data acquisition of a sample 20 of a material, and transmissionof an optionally pre-processed form of the acquired data to the centralprocessor 10. Preferably, the method also includes generation of apredicted result using a property model algorithm stored in the centralprocessor 10 and transmission of the measurement results 14 to one ormore user interfaces. The sample may be located in the sample chamber 24of a sample presentation device 22, but some methods of data acquisitionmay not require this such as when a probe is used to acquire data from aportion of complete living organisms, such as an ear of corn or intacthuman skin. The data acquisition may be initiated manually by anoperator or automatically from a control signal from instrument control46 transmitted by the user interface 32. The user interface 32 mayreside in the local processor 34, but various input and output devicesof a user interface may reside in other locations either internal orexternal to the data acquisition device 2. The local processor 34 may bea stand-alone unit such as a laptop or desktop personal computer or anassembly of components including one or more computer chips, read-onlymemory, firmware, and the like located within or external to the dataacquisition device. Pre-processing of the acquired data from themeasurement process is optional. Further, the measurement results 14 arenot necessarily transmitted to the user interface 32 of the localprocessor 34. Measurement results 14 may be forwarded to one or evenseveral locations removed from the operator at the instruction of thecustomer.

[0126] For FT-NIR sensors, the raw detected data is in the form ofindividual scans of interferogram spectra which may be combined byaccumulation of data from repeated individual scans. Instrumentoperation software, such as the Bruker OPUS product, installed on thelocal processor 34 of the sensor 2 averages the acquired multi-scanspectra in the interferogram mode, converts this average interferogramto a single-channel spectrum by fast Fourier Transform, and converts thesingle-channel spectrum to an absorbance spectrum according to theequation:absorbance  spectrum = −log   (single-channel  spectrum/background  spectrum).

[0127] This calculation removes eccentricities in the sample spectrumattributable to the background spectrum specific to that sensor 2. It isnoted that a “single-channel” spectrum is an alternative representationof a spectrum that contains a plurality of data values and, thus, asingle-channel spectrum is still a multichannel data set. A backgroundspectrum is acquired by operating the instrument 30 either with nosample in the sample chamber 24 or with a data acquisition probe indirect contact with a reference material such as the reflective surfaceof a mirror or a spectralon composed of polytetrafluoroethylene. Theacquired background spectrum provides a digitized spectrum 44 which isthen used to pre-process spectral data according to the above equation.A digitized background spectrum is preferably stored in the localprocessor 34 of each sensor 2, 4, 6 although other storage locations maybe used. Thus, a single background spectrum is generated and stored foreach sensor 2, 4, 6 for use in generating a plurality of predictedresults from one or more property models in the future. The singlebackground spectrum is generally acquired from a single data acquisitiondevice, but may be generated from an accumulation of background spectraacquired from two or more data acquisition devices. The backgroundspectrum is generally stored in the local processor 34, but may bestored in the central processor 10 or another location remote from thedata acquisition device that can be connected to the data acquisitiondevice by a communication link 8.

[0128] A material as used herein encompasses any object for which it isdesired to generate a value of a property. The value may be ameasurement for which a quantitative result is desired. The value mayalternatively be a qualitative one, indicating only the presence orabsence of a component. The material may be in any physical form, i.e.,gaseous, liquid, solid, or mixed phases, and may encompass both discreteunits or components thereof, such as either a whole oilseed or the oilexpressed from the seed, and may consist of mixtures of differentsubstances, such as foreign matter in a sample of whole grain. Inaddition, for certain types of analyses the term may encompass livingplant or animal matter, such as human tissue or fluids. The sensor 2 mayoptionally include an excitation source 26, a sample handling device 22to present the sample 20 to the detector 28, and associated electronicsto convert the detector output into a digitized format 44. A wide rangeof sensors of various types, herein called sensor-types, can be used toacquire data for subsequent analysis, including but not limited to thosefor NIR, mid-IR, Raman, UV, visible, and NMR spectroscopy, liquid or gaschromatography, or mass spectrometry. Other spectroscopic andnon-spectroscopic types of sensors may also be used.

[0129] If sensors of different types are used for on-site analysis,sensor-type-specific property models are required. A sensor-type is atype of instrumentation that acquires multichannel data based on aspecific analytical method, such as Fourier-Transform NIR or gaschromatography. It is preferred that different instruments within asingle sensor-type are manufactured according to well defined designspecifications as is done for specific models of an instrument, usuallyby a single manufacturer, such as the MATRIX Model F FT-NIR spectrometermanufactured by Bruker Optics.

[0130] In one embodiment of the present invention, the detectorinformation acquired on a sample 20 may optionally be both pre-processedand converted into a digital format to facilitate rapid communicationwith the central processor 10 and subsequent data processing. Whileanother embodiment could utilize the transmission of unprocessedmultichannel data, the preferred embodiment is advantageous in thatdigitization and averaging occur prior to transmission of data to thecentral processor 10.

[0131] The user interface 32 may be installed in an apparatus physicallyattached to, or integrated with, the sensor 2, but this is not required.Generally, an output device of the user interface 32 is located in thevicinity of the sensor 2 because the resultant information willtypically be desired at the location where the data on the particularmaterial is acquired. For example, in the case of the characterizationof a property of interest for oilseeds, the measurement data 12 may beacquired at a storage silo, or near a transport truck, and themeasurement results 14 are returned to the operator at these locations.Alternatively, the measurement results 14 may be disclosed at an officenear the storage silo with optional ancillary equipment such as aprinter to generate a written record of the generated result. As afurther alternative, however, the measurement results 14 may bedisclosed at an administrative or processing location of the customerwhich is geographically far removed from the storage silo location wherethe measurement data 12 was acquired. It is possible, though notpreferred, to communicate with the operator or other recipient of theresult in the vicinity of the sensor 2 by a communication linkindependent from that of the link 8 connecting the sensor 2 and thecentral processor 10. Thus, measurement results 14 generated by thecentral processor 10 may be communicated to the operator or anotherdesignated party in an indirect manner such as telephone or facsimiletransmission even where the sensor 2 and central processor 10 are linkedvia the Internet.

[0132] The hardware of the system architecture is characterized by anunusual master-slave relationship established between the one or moresensors 2, 4, 6 and the central processor 10. Since data acquisition isinitiated at the sensors 2, 4, 6 the central processor 10 becomes aslave to the sensors 2, 4, 6 in the field. The sensors 2, 4, 6 do notoperate as self-contained analyzers, but are dependent on the centralprocessor 10 for data analysis. Thus, the sensors 2, 4, 6 are dumbmasters and the central processor 10 is a smart slave in a many-to-onerelationship.

[0133] In another embodiment of the system architecture, the centralprocessor 10 sends one or more property models (as, for example, aninformation packet of parameters sufficient to define one or moreproperty model algorithms) to at least one sensor, e.g., 4, over thecommunication link 8, at various intervals as desired to enable at leastsensor 4 to perform local computations of measurement results. In thismode of operation, the sensor 4 is a self-contained analyzer, althoughall sensors 2, 4, 6 still use the same property model algorithm for aparticular property of interest. Since the algorithm does not containany instrument-specific parameters, this embodiment is different fromcalibration transfer and instrument standardization methods whichattempt to compensate for instrument variance by developing andtransferring instrument-specific calibration models for use by specificinstruments. This embodiment is useful as an alternative strategy forsituations in which it may not be possible or desirable to use acommunication link 8 for real-time model calculations.

[0134] This alternative embodiment can be used as a backup strategy toenable measurements to be performed when interruptions in thecommunication link 8 have existed or may be expected. This embodimentcan also be used for measurements in remote locations where it isimpossible, difficult, expensive, or inconvenient to use a communicationlink 8 to the central processor 10. This mode of operation may enablealternative productive analysis methods as electronic components getsmaller and faster, particularly for situations where very rapidresponse times are desired or when it is desirable to avoid sending dataor results over a communication network 8. Presently, this embodiment isless preferred because an update to the property model is notautomatically available to the sensor 4. Nonetheless, this embodimentretains the advantage of a single property model algorithm at a point intime from which predictions are made.

[0135] In another embodiment, the current version of the property modelalgorithm is transmitted to the sensor 4 immediately prior to the dataacquisition step of each measurement process to ensure that themeasurement results are computed from the most recent update to theproperty model. In these alternative embodiments, the measurement dataand locally computed measurement results may be transmitted to thecentral processor 10 for storage and distribution at a later time.

[0136] The user interface 32 located near the sensor 2 can provide aselectable menu of properties of interest that are available at thecentral processor 10. Prior to each measurement, for example, thecentral processor 10 can transmit the current list of availableproperties of interest to the sensor 2. Then, the user will alwaysaccess from the updated selectable menu displayed at sensor 2 the mostcurrent list of available properties, and thereby use the most currentrevision to all property models without needing to manually installsoftware updates to replace, change, add, or delete models as would needto be done if the models were stored in computing devices individuallyconnected to each sensor 2.

[0137] Where multiple sensors 2, 4, 6 may be used, the sensors 2, 4, 6may be located at a variety of distances from the central processor 10as needed to provide for on-site analysis. The sensors 2, 4, 6 aretypically remote or distant from the central processor 10, where remoterefers only to the existence of a physical separation between thesensors 2, 4, 6 and the central processor 10. A remote sensor 2, 4, or 6does not in any way require that the on-site location is isolatedgeographically or technologically. Indeed, one or more remote sensors 2,4, 6 can be located in a central laboratory in a large metropolitan areaas well as at isolated sites far removed from population centers.

[0138] One example of a communication link 8 between sensor 2 andcentral processor 10 which may be utilized is the Internet. In operationusing one or more NIR spectrometers as analytical instruments 30, theon-site analysis system utilizes a user interface 32 running in abrowser installed in the local processor 34 which performs a securitylogon function, presents the operator with input fields to identify thesample being analyzed, and prompts the loading of the sample 20 into thesample presentation device 22. The security logon process requires botha password known only to the operator and a sensor 2 identifiedelectronically by a serial number. This logon therefore requires both aspecified piece of hardware and an independent password.

[0139] The user interface 32 then presents the operator with inputfields to identify the sample 20 being analyzed. Examples of descriptivedata collected about the sample 20, herein called sample identificationdata, include but are not limited to the type of sample being tested,the location where the sample was collected, and a unique identity ofthe sample. The operator is then prompted to load the sample 20 into thespecific instrument sample presentation device 22 and to start thecollection of measurement information from the multichannel analyticalinstrument 30, this information herein called multichannel data. In thecase of a near-infrared spectrometer, the multichannel data collected ispre-processed spectral data. The measurement data 12 comprises thesample identification data and the multichannel data, which is sent overthe Internet 8 to the central processor 10 for analysis. The informationpacket of measurement data 12 is processed through routers and firewallswhere it is received by a standard web server such as the Microsoft®Internet Information Server.

[0140] The information packet is accepted at the central processor 10 bya web server which initially processes the information and forwards thisinformation to a queuing system. The queuing system acceptsnear-simultaneous transmissions from multiple sensors and queues thesubmissions to be processed in FIFO (first in first out) order by thecentral processor.

[0141] The analysis engine of the central processor 10 accepts theinformation packet of measurement data 12 sent by the queuing system andopens the packet to find both the multichannel data to be analyzed andsufficient descriptive information entered by the operator in the sampleidentification data to select the proper property model to be used foranalysis. After calculating the predicted results through the properproperty model, the measurement results 14 are then passed back to thequeuing system for communication over the Internet to the operator atthe user interface 32, or optionally to another user interface at alocation in the same vicinity, a field office of the customer, oranother alternate location. The measurement data 12 and the processedmeasurement results 14 are preferably stored in a database, called thedata repository, where they may be retrieved for later reference ifdesired.

[0142] The communication link 8 used in transmitting data and results,briefly discussed above, is considered in more detail. The communicationlink 8 may encompass any device or communication system which canprovide for information transfer within acceptable limits for signaldegradation and transmission speed. For example as discussed above, theon-site location sensor 2 and central processor 10 may be connected toone another via a global, public network such as the Internet.Alternatively, a local or global private network or combination ofpublic and private communication links may be employed. Communicationbetween the sensor 2 and central processor 10 can be enabled by a userinterface 32 consisting of an emulated instrument panel or graphicaluser interface which runs on a standard Internet browser executing inthe local processor 34. The local software component of the instrumentpanel is a set of HTML-based code that includes additional code such asJava® as an example of client-based software that communicates with theinstrument 30. This set of software code is configured to be able tocommunicate with instruments from multiple manufacturers, or multipleinstrument designs from a single manufacturer. Thus, though the sensor 2may include a near-infrared device, the host-based software component ofthe instrument panel executing in the central processor 10 is designedto accept input from other spectroscopic and non-spectroscopic devices.The user interface 32, running within a standard Internet browser, is aninteractive software application that can be run from any platform thatcan host a browser, including but not limited to desktop personalcomputers and laptops, as well as various wireless devices.

[0143] Within a sensor 2, the data acquisition portion of the system isgenerally the detector 28 of the analytical instrument 30 and the userinterface 32. Note that the local processor 34 may be a separate,stand-alone device or may be integrated into the device 2. In eithercase, the sensor 2 is generally considered to comprise at least thedetector component 28 of the analytical instrument 30 and the localprocessor 34 that runs the user interface 32. The user interface 32contains the entry point to the system sign-on, input fields forcustomer authentication and system usage authorization, and othermiscellaneous interfaces such as system status, announcements, access toauthorized portions of information in a data storage unit to provideadditional information reporting capability, and help text to provideusers with specific operating instructions. The user interface 32running on the browser of the local processor 34 of the sensor 2connects to an analytical instrument 30 such as a NIR detector, in whichcase the user interface 32 controls data acquisition from instrument 30and the transmittal of that data over a communication link 8, i.e., theInternet in this instance, to the central processor 10.

[0144] The system architecture based on use of the Internet as thecommunication link is shown in FIG. 3. This diagram shows softwarecomponents of the hardware illustrated in FIG. 1. Data acquisition bythe analytical instrument 30 is initiated by the user interface 32.Connection between the user interface 32 and the central processor 10 isprovided by the ASP (active server page) 50 running on a web server andcontrolling the communications. The central processor 10 is comprised ofthe ASP 50, a transaction queuing system 52 that provides scalabilityfor transaction volume, the security controller 54 for authenticatingand granting processing rights to incoming transactions, the analysisengine 56 which requests parameters of property models from the datarepository 58, performs pretreatment operations, and computes thepredicted value for each property requested, and the data repository 58which stores transaction input information, or measurement data 12, andtransaction output information, or measurement results 14, for providingpotential added informational value to the customer. The data repository58 may be a SQL database, although other types of databases may be used.It is preferred that the database be relational. Integrated into thecentral processor 10 is a security architecture that protectsproprietary rights to the various data types stored in the datarepository 58 of the central processor 10 and delivers information onlyto those properly authorized to receive the information.

[0145] The queuing system 52 forwards the measurement data 12 to theanalysis engine 56, upon which the analysis engine 56 requests andobtains the parameters defining one or more property models from thedata repository 58 as specified by the descriptive information in themeasurement data packet 12. The analysis engine 56 performs calculationscomprising pretreatment, property prediction, and associated statisticalmeasures of the property prediction such as the Mahalanobis distance.The queuing system 52 requests that the data repository 58 accept themeasurement results 14 for storage and requests that the web server 50accept the returned measurement results 14 for transmission to the userinterface 32, to one or more other user interfaces, or both.

[0146] To enable multiple sensors 2 to communicate with the centralprocessor 10, the transaction queuing system 52 of the central processor10 performs a function of accepting a high volume of near-simultaneouscommunications from multiple locations over the communication link 8 andcontrolling the communication flow to the analysis engine 56 andrelational database 58. The queuing system 52 is described in detail ina separate patent application, entitled “Extensible ModularCommunication Executive With Active Message Queue And IntelligentMessage Pre-Validation, by James Thomas Kent, et al., filed on even dateherewith, Serial No. 60/307,347, which is incorporated herein byreference.

[0147] In preparing to conduct an analysis of a material at a variety ofremote locations, it is necessary to generate a calibration model whichaddresses all factors that may significantly influence the measuredproperties. As noted above, such modelable factors occur in thefollowing areas: the material to be analyzed; the instrument; theenvironment; and the material presentation. Operator-to-operatorvariations in the measurement process are generally captured byinfluential factors associated with the material presentation. Thedesired reliability of the analysis dictates the number of factorswithin each of these areas which should be anticipated and modeled.

[0148] The development of calibration models is generally done using oneor more sensors that are not necessarily connected to the centralprocessor 10. Furthermore, the calibration models are developed andvalidated using chemometric software such as OPUS Quant-2 and Ident on acomputer that may be different from the local processor 34 of a sensor.Validated calibration models are loaded into the data repository 58 ofthe central processor 10 to enable on-site measurements by remotesensors 2. In the course of generating an instrument response relatingto the property of interest of a sample 20 of a material, using one ormore sensors not necessarily connected to the central processor, aunivariate calibration would be used if the instrument response wasdependent only on the property of interest. Unfortunately, it is rarelypossible to obtain ideal measurements of a property where themeasurement process is selective for just the property of interest.Realistically, particularly to build models suitable for use bynon-specialists, additional variables must be taken into account toreflect the realities of generating a calibration model where variationsmay be expected to occur in the instrument, the environment and in thesample presentation, in addition to variations between and withinsamples of material. In addition to random measurement noise, the datamay be affected by chemical and physical interferences. Chemicalinterferences alter the instrument response due to the presence ofchemical impurities in the material being tested, inhomogeneities in thedistribution of chemicals in a mixture, and the like. Physicalinterferences alter the instrument response due to, for example, lightscattering effects and instrument variances.

[0149] A flowchart for establishing the feasibility of developing aproperty model is provided in FIG. 4. This process is used to determineif a selected measurement method using a specific sensor-type is capableof being used to predict a property of interest over a desired range ofthat property. Initially, as shown in block 70, the method andobjectives are defined. The method of block 70 refers to the analyticalmethod, such as FT-NIR or gas chromatography, and includes specifyingthe analytical instrument, such as a Bruker MATRIX Model F FT-NIRspectrometer, and the sample presentation, such as 0.5 to 1.5 mL ofliquid contained in a metal closure cap with an 18 mm diameter. Theobjectives of block 70 define expectations for a property model. Theseobjectives identify the property of interest, the desired precision forpredictions of the property of interest, the calibration set upon whicha feasibility assessment will be made, and the expected ranges of thesecondary variables related to the samples and measurement conditions asspecified by the customer and as recommended by the analyst responsiblefor model development. Then, as shown in block 72, the expected range ofthe primary variable is defined from the objectives of block 70, and apreliminary set of calibration samples is obtained that span theexpected range of the primary variable as shown in block 74. The numberof samples in this preliminary calibration set typically ranges fromabout 5 to about 50.

[0150] Next, a preliminary model is built for the primary variable,shown in block 76. At this initial stage of model development, thepotentially influential factors are not intentionally varied.Measurements at this stage are taken under ambient conditions with asingle instrument as defined by the method and objectives 70. Thetraining set used in block 76 may be generated by taking a singlemeasurement of each sample in the calibration set of block 74 or thetraining set may be expanded to include repeated measurements of some orall of the calibration set of block 74. Each instrument response in atraining set may be generated from the multichannel data acquired duringa single multichannel measurement or during two or more multichannelmeasurements. In some cases, it may be desirable to combine oraccumulate the multichannel data acquired from two or more multichannelmeasurements of different samples, different measurement conditions, orboth, into a single set of multichannel data comprising the instrumentresponse.

[0151] Using the symbol “y” to denote the property of interest, thefollowing definitions apply in further considering the calibrationprocess: y_(obs(i)) = the i^(th) observed or known value of property y.This is also called the true or expected value. y_(pred(i)) = the i^(th)predicted value of property y. This is also known as the measured value.Res_(i) = y_(pred(i)) − y_(obs(i)), and is known as the i^(th) residualor deviation between predicted and observed results. M = the number ofpredicted values in the instrument response set. SSE = M Σ(Res_(i))² i =1 = sum of squared errors. RMSECV = square root of SSE/M for thetraining set = root mean square error of cross-validation wherein thecalibration model is generated with a training set and predictions fromthis model were made using the same training set. RMSEIR = Square rootof SSE/M for the instrument- response set = root mean square error ofpredictions from the instrument-response set. RMSEP = square root ofSSE/M for the validation set = root mean square error of predictionwherein the calibration model is generated with a training set andpredictions were made on a validation set. r² = coefficient ofdetermination; or square of the correlation coefficient (r), whichprovides a measure of the degree of correlation between y_(obs(i)) andy_(pred(i)). R² = 100 r²

[0152] Using commercially available chemometric modeling software suchas HOVAL software (such as Version 1.6, 1992), AIRS software (such asVersion 1.54, 1999) or Bruker OPUS-NT Quant-2 software (such as Version3.01, 2000), the initial version of the calibration model is generated,as indicated in block 76, from a training set in which the primaryvariable spans the anticipated range over which this variable isexpected to vary during actual measurements in the future. Thecoefficient of determination, r², is used to determine whether thecorrelation is adequate to measure the property, as shown in block 78.Generally, if r² is less than about 0.6, or equivalently if R² is lessthan about 60, it will be necessary to consider adjusting the method orobjectives as shown in block 80. During this adjustment step, aprocedure for sample preparation or impurity reduction may be defined orrefined, the preliminary calibration set may be modified, or a differentmethod may be selected. If the method or objectives can be altered, thenthe new definitions are adopted as shown in block 82 and, according tothe expected range of block 72 and using a preliminary set ofcalibration samples from block 74, another preliminary model isdeveloped in block 76. The coefficient of determination, r², of thetraining set of block 76 is used to decide whether the property ofinterest is measurable according to the method and objectives of block82. As long as the property is not measurable (block 78) and the methodor objectives can be adjusted (block 80), blocks 82, 72, 74, 76, and 78are repeated. If the method or objectives cannot be adjusted (block 80),then model development is not feasible (block 98) in accordance with thedefinitions of block 70 as adjusted by block 82.

[0153] If it is determined that the correlation is adequate (block 78),specifically if r² is greater than about 0.6, although smaller or largervalues can be used depending on specific objectives in blocks 70 and 82,it is considered feasible to begin testing for potentially influentialfactors. The next step is to identify the types and expected ranges ofthe potentially influential factors as shown in block 84. Next, as shownin block 86, one or more potentially influential factors are selectedfor experimental investigation. In block 88 experimentation is conductedto determine whether the factors are indeed influential using a smallvalidation set. In block 90 the preliminary model is revised tocompensate for secondary variables that have been experimentallydemonstrated to be influential factors and appropriate methods ofpretreatment are identified when possible. A decision is then made inblock 92 to determine if the property of interest is still measurable inthe presence of variations in the secondary variables. The measurabilityof block 92 is determined by comparing the RMSEP of the validation setof block 88 with the desired precision specified in the objectives ofblocks 70 and 82. The property is considered measurable in block 88 ifthe RMSEP is less than or approximately equal to the desired precisionvalue. If not measurable, the method or objectives for developing thecalibration model are adjusted as in block 82, if possible (block 80),and the process is repeated beginning again with block 72. If, in block92, the property of interest is still measurable to the limit of desiredprecision specified in the objectives of blocks 70 and 82, and if thereare more factors in block 94 that have not been investigatedexperimentally, then the feasibility process returns to block 86 whereone or more additional, potentially influential factors are selected,experimentation is conducted in block 88 to determine those additionalfactors that are influential, the model is revised, methods of datapretreatment are identified in block 90, and a determination is made inblock 92 whether the property of interest is still measurable to thelimit of desired precision. The process of selecting potential factors(block 86), experimentally determining influential factors (block 88),revising the model and identifying pretreatment methods (block 90), anddetermining if the property is still measurable to the limit of desiredprecision (block 92) is repeated until there are no more potentiallyinfluential factors to consider in block 94. If the property of interestis ultimately determined to be measurable (block 92) in the presence ofvariations in the secondary variables identified in the objectives ofblock 70 as adjusted by block 82, and if there are no more potentiallyinfluential factors to consider in block 94, then model feasibility isestablished as shown in block 96.

[0154] The process for identifying influential factors and revising thecalibration model to accommodate secondary variables that aredemonstrated to be influential factors experimentally is considered indetail below.

[0155] To determine if a factor has an influence on the predicted resultof a model (block 88), a small validation set is developed whichincludes measurements at some different levels of the potentiallyinfluential factor. Specifically, one or more samples with y_(obs(t))for the property of interest in the presence of a range of values forthe secondary variable being considered are measured with the instrument30 (FIG. 3), and the i^(th) instrument response is used to generatey_(pred(t)) using a previously generated calibration model. The RMSEP ofthis validation set is then calculated using this previously generatedcalibration model to serve as an estimate of the level of precision ofthe property model in the presence of variations in secondary variables.During this feasibility assessment process, it may be sufficient to usea validation set containing a single pair of observed and predictedvalues, and use the absolute value of the difference between thesevalues as a single-point estimate of RMSEP. It can be appreciated thatdifferent factors will have different degrees of influence on theproperty of interest values being predicted by the property model. Insome applications the desired level of precision as specified in theobjectives of blocks 70 and 82 will be very high, and factors havingeven a very minor influence will be considered in generating the model.In other instances where the desired level of precision is lower, thefactors having low levels of influence may be ignored, with consequentlosses in precision.

[0156] One method of quantifying an acceptable or desired level ofprecision involves consideration of a confidence interval forpredictions of a property of interest. A confidence interval is a rangeof predicted values that includes the true average value of the propertyof interest a specified percentage of the time. Thus, for example, ifthe average value of a property is 3.5±0.6 at the 95% confidence level,then 95% of the time the predicted value of the property is expected bein the interval from 2.9 to 4.1. Since the average value of a number ofmeasurements tends to follow a Gaussian distribution, the confidenceinterval can be expressed as an average value plus or minus amultiplicative factor times the standard deviation of the average value,where the multiplicative factor can be obtained from statistical tablesthat relate this factor to the area under a standard Gaussiandistribution curve. For example, when the multiplicative factor has avalue of 1, 2 or 3, the corresponding area under the standard Gaussiandistribution is about 68%, 95% or 97.7%, respectively. So, if it isdesired to have about 95% of the future predicted values fall in theinterval from +0.6 to −0.6 of the true average value of a property, thenthe multiplicative factor would be 2 and the standard deviation of theaverage value should be 0.3, which is one-half of the desired precisionof 0.6. The standard deviation of the average value can be estimated,for example, by the RMSEP of a validation set. As a result, the desiredprecision can be expressed as a multiplicative factor times RMSEP. Thus,a desired precision of ±0.6 at the 95% confidence level is equivalent tospecifying that the RMSEP should be less than or equal to 0.3. This isequivalent to specifying the desired precision of ±0.6 as 2 times anRMSEP of 0.3 at the 95% confidence level. This is also equivalent tospecifying the limit of desired precision as a RMSEP of 0.3 at the 95%confidence level.

[0157] In general, varying the level of a potentially influential factorproduces instrument responses for the validation set that, afterpretreatment and evaluation by the calibration model, predict values ofthe property that differ numerically from the corresponding knownvalues. The numerical differences may or may not be statisticallysignificant. If the RMSEP value computed from these numericaldifferences is less than or approximately equal to the value used tospecify the limit of desired precision, the numerical differences arenot statistically significant to that limit of desired precision.Alternatively, if the RMSEP value is greater than the limit of desiredprecision, the numerical differences are considered to be statisticallysignificant. In those cases when the numerical differences are notstatistically significant as measured by RMSEP, the results predictedfrom the property model are considered to be statistically equivalent.Thus, when varying the level of a potentially influential factor togenerate a validation set in block 88 produces statistically equivalentresults, the factor is considered to be non-influential for thatproperty model to within the limit of desired precision. Alternatively,if the numerical differences as measured by RMSEP are statisticallysignificant, then the factor is influential and the property model needsto be revised in block 90. If the revised model generated in block 90then produces results that are not statistically different from theknown values, the property is still measurable to the desired precisionin block 92.

[0158] Acceptable levels of precision as measured by RMSEP fall withinthe range of values defined by the particular client acquiring thepredicted results as established in blocks 70 and 82. If acceptableprecision is obtained in the experimentation of block 88, predictionscan be made with that calibration model independent of changes in thesecondary variable, the associated factor is considered to benon-influential for that model, and the model does not need to berevised in block 90. The RMSEP is one calculated value which is used toevaluate the level of influence of a factor. A greater increase in RMSEPfrom a validation set relative to the RMSECV of a training set indicatesa greater level of influence. If the RMSEP from the validation set ofblock 88 does not meet the objectives of blocks 70 and 82, then themodel will need to be revised in block 90. Again, the ultimate desiredprecision of the model will determine if the influence due to aparticular factor will require compensation in the development of themodel.

[0159] It can be appreciated that a factor which is ultimatelydetermined to be influential in connection with predicting one propertyof interest of a sample may not influence the prediction of anotherproperty of interest. Thus, the determination of whether a factor isinfluential is dependent on the specific circumstances of theacquisition of data. Furthermore, it is not required to quantify each ofthe secondary variables. Thus, it is not necessary to determine theactual humidity inside the measuring instrument, or determine the actualage of the excitation source, such as a light used in a near-infraredspectrometer, as long as data points spanning the expected ranges ofhumidities and light ages are included in the training set or thevalidation set, unless pretreatment eliminates such factors.Quantifiable values for the secondary variables are not required but maybe recorded if the description of a value is desired.

[0160] The model considers a range of variables wherein the range isdefined by the type of analysis being made and the expected range ofmeasurement conditions for the particular property of interest. Theprimary variable is directed to the particular property of interest. Thesecondary variables may include but are not limited to the followingdirected to secondary material characteristics of the sample: impuritiesor other components in the material to be tested; the form of thesample, i.e., solid, liquid, gas or mixed phases; the presence ofturbulence in a gaseous or liquid sample; the presence of multiplephases in a sample, such as gas in a liquid, liquid in a solid,hydrophilic and lipophilic liquids combined as an emulsion, or solidsdispersed in a liquid or a gas; the particle size distribution of asolid sample; the presence of inhomogeneities in a sample regardless ofform; the distribution of shapes of solid particles; the degree ofcompaction of a sample of solid particles; the tendency of a sample toalter its composition and structure during the measurement process, suchas by the formation of hydrates or oxides in a humid air environment;the tendency of a sample or components thereof to evaporate, sublime ordecompose; and the tendency of one or more components of a liquid orgaseous sample to settle, or stratify.

[0161] The secondary variables also may include but are not limited tothe following directed to the instrument: changes in one or moremechanical, optical or electrical components which would impact signaldetection or conversion of the input signal into a format suitable forsubsequent processing. These changes include inter-relationships betweencomponents such as positional and orientation relationships, andelectrical, optical or mechanical interactions in the assembly of thecomponents. For many of the instrument components the effect of age isnegligible or unlikely to affect the instrument response, such asregarding the instrument housing or the overload protection circuitry.Other components will create a more dramatic and definite effect on thesample output signal over time, such as a light source in aspectrometer.

[0162] The secondary variables further may include but are not limitedto the following directed to the environment and the samplepresentation: temperature of the sample; temperature, humidity andatmospheric pressure of the environment in the vicinity of the testinstrument; humidity inside the test instrument; airflow in the vicinityof the test instrument; background radiation in the vicinity of the testinstrument; the dimensions of the sample container or detector chamberas they affect the pathlength of the excitation beam through the sample;the distance between sample and detector; the presentation speed or flowof the sample relative to the detector and relative to the dataacquisition rate of the detector; and the amount of sample presented tothe detector, expressed as volume, weight, surface area, pressure, andthe like.

[0163] Each of the above variables are considered as potentiallyinfluential factors. The actual significance of a secondary variable isdetermined during the process of developing the calibration model for aparticular property of interest.

[0164] In connection with developing and using a calibration model,reference is made to the flowchart in FIG. 5. Initially, the feasibilityof establishing a calibration model is determined in block 100, whichconsists of at least blocks 70, 72, 74, 76, and 78, as well as blocks 80and 82 as required, of FIG. 4. In addition, after the property isdetermined to be measurable in block 78, one or more potentiallyinfluential factors is investigated experimentally as indicated inblocks 84, 86 and 88. Revisions to the preliminary model in block 90 canbe undertaken either by making several stepwise adjustments for one or asmall number of factors determined to be influential in block 88, or byadjusting for all experimentally determined influential factors at onetime. During these feasibility steps, essential characteristics areidentified for the training set along with preliminary methods ofpretreatment. The essential characteristics of a training set includethe range of material characteristics that must be represented bysamples in the calibration set, and the levels of influential factorsthat must be represented by instrument responses generated from thecalibration set. The preliminary methods of pretreatment are thosemathematical operations that must be used on optionally pre-processedinstrument responses to compensate for influential factors that are notnecessarily represented by instrument responses in the training set.

[0165] Considering this information, in block 102 of FIG. 5, a trainingset is defined and a method of pretreatment is determined in order tobegin the process of revising the preliminary model. Then, in block 104the i^(th) revised model is built and RMSECV(i) is computed, where iequals one for the first revision and i is incremented by one for eachsubsequent revision. In block 106, if no outliers are found in thetraining set, then a validation set is defined and the methods ofpretreatment are adjusted as shown in block 108. The i^(th) revisedmodel is then validated and RMSEP(i) is calculated in block 110. Next,if outliers are not found in the validation set using this model asindicated at block 112, then a decision is made at block 114 whether toinstall the new property model in the data repository (block 58, FIG. 3)of the central processor 10. The outcome of block 114 depends on whetherthe model has been constructed to compensate for an effectivelycomprehensive set of influential factors. If no, then the processreturns to block 108, the validation set is defined to includevariations of one or more additional influential factors, thepretreatment is adjusted, and the process continues at block 110. Ifyes, then the new property model is installed, and on-site measurementscan be taken using the new property model as indicated at block 116.

[0166] If probable outliers are found in the training set at block 106or the validation set at block 112, then the outliers are classified atblock 123 and a determination is made if the probable outliers are goodoutliers at block 124. If any detected outliers are good, then adecision is made to determine whether to extend the training set atblock 126. If yes, then the process returns to defining which goodoutliers will be added to the training set and an appropriate method ofpretreatment is determined for this training set, as indicated in block102. After building the next, or i^(th), revised model and computingRMSECV(i) in block 104, the process continues forward with block 106. Ifthe decision is made not to extend the training set in block 126, or ifthe probable outlier is not good at block 124, then a decision is madewhether to correct or improve the training set in block 128. Iferroneous data are found in the training set or the validation set, orif improved known values become available, then the training set isre-defined with the improved or corrected data and the process resumesat block 102. If the decision is made not to extend the training set inblock 126, and if there are no corrections or improvements to be made inblock 128, then if the current, i^(th) revised model has not beenvalidated in block 130, the process resumes at block 108. The validationsteps in blocks 108, 110, and 112 are repeated until no outliers arefound in block 112, and a decision is made at block 114 whether toinstall the new property model. Alternatively, if the decision is madenot to extend the training set in block 126, and if there are nocorrections or improvements to be made in block 128, and if the current,i^(th) revised model has been validated in blocks 108, 110, and 112, andif it is decided to install the new property model in block 114, thenext step is to proceed with taking on-site measurements at block 116.

[0167] While taking on-site measurements, block 116, as well as whilebuilding at block 104 and validating at block 110 a property model, theMahalanobis distance, MAH, is computed for each predicted value. If theMAH is greater than the threshold value for good outliers of thatproperty model, the predicted value is considered to be a probableoutlier. Thus, the determination of whether a predicted value of anon-site measurement at block 116 is a probable outlier at block 118leads to two possible outcomes. If yes, then the predicted result isprobably invalid and MAH is used to classify the prediction as aparticular type of probable outlier at block 122. If no, then theprediction is presumably valid at block 120. In either case, the resultsare reported. If a probable outlier is detected, the results include adescriptive interpretation of the type of probable outlier at block 120.The analysis system is then ready to process the next on-sitemeasurement at block 116.

[0168] The descriptive text categorizing the predicted results at block120 depends on the MAH of the predicted result and on a set ofpreviously determined threshold values of a property model. If, for aparticular property model, the threshold value for good outliers is 0.4,the threshold value for bad outliers is 1.0, and the threshold value forextremely abnormal multichannel data is 100, then the descriptiveinterpretations could be “Possible new type of sample” if MAH is greaterthan or equal to 0.4 and less than 1.0, “Unexpected result” if MAH isgreater than or equal to 1.0 and less than 100, and “Sample notdetected” if MAH is greater than or equal to 100. If MAH is less than0.4, the predicted result is presumably valid and no descriptiveinterpretation is required. Help text is also provided at an outputdevice of the user interface 32 in the vicinity of the sensor 2 thatgives a recommended course of action as specified, for example, by anadministrator of the operator's company. Thus, for example, if apredicted result is accompanied with the message “Possible new type ofsample,” the help text can instruct the operator to forward the sampleto a laboratory for further characterization. If a predicted result islabeled as “Unexpected result,” the operator can be instructed to verifythat a sufficient quantity of sample is available, to verify that thetype of material is identified correctly in the input fields of the userinterface 32, and to take another measurement of the sample. In thiscase, if a second measurement of the sample gives an “Unexpectedresult,” then the operator can be instructed to consider the material tobe unacceptable. If MAH is greater than 100, the invalid predictionvalue can be omitted from the measurement results 14, and the message“Sample not detected” can be accompanied by help text that instructs theoperator to contact the provider of the on-site measurement service tohelp investigate and remedy the problem.

[0169] When outliers are detected during on-site measurements in block118 and a classification is made of the type of probable outlier inblock 122, two events occur. First, the results are reported in block120 and second, an investigation of the outliers is initiated in block124. If the outliers are good in block 124, then there is an opportunityto extend the property model to compensate for a wider range ofvariations in the sample and the measurement conditions. If the outliersare not good in block 124, and if the training set does not need to becorrected or improved in block 128, then there is no opportunity toenhance the model and, since the current property model has not beenaltered, the current model is considered to be validated in block 130and on-site measurements continue to be taken in block 116 with nointervention from these two events. If the outliers are good in block124, then the customer can consult with the provider of the on-sitemeasurement service to decide whether to extend the training set asindicated in block 126. This decision can be based on requirements orpreferences of the customer, time and personnel resources of theprovider, and economic considerations from both parties.

[0170] The ability of the on-site analysis system to detect probableoutliers during on-site measurements at block 118, classify the type ofprobable outlier at block 112, and identify good outliers at block 124provides the continual opportunity to adapt the property model tocompensate for an effectively comprehensive set of influential factorswhich may change at some unpredictable time in the future. The outlierdetection and classification occur on a real-time basis, so the customeris notified of probable outliers at the earliest opportunity.Furthermore, since the measurement results are stored in the datarepository 58 of the central processor 10, the occurrence of probableoutliers can trigger an automatic notification to the responsibleparties, which can be the provider of the on-site analysis system, oneor more administrative personnel at the customer's company, or both.This notification can begin the process of investigating the cause ofthe outliers and, in cases when one or more detected outliers are good,a decision can be made to extend the training set at block 126, andrevise the property model (beginning with block 102 and continuingforward) so it will compensate for an effectively comprehensive set of awider or different range of influential factors that occur during actualmeasurements.

[0171] Alternatively, it can be decided that one or more good outliersare not appropriate for inclusion in an extended training set. Thisdecision can result from a situation in which the occurrence of the goodoutlier will probably be a rare occurrence or a situation in which thegood outlier is caused by a type of material or an unusual measurementcondition that the customer wants to avoid classifying as a validmeasurement. Hence, the “effectiveness” of an effectively comprehensiveset of influential factors can be defined or refined for a particularproperty model. The set of influential factors of a property model isconsidered to be effectively comprehensive until one or more goodoutliers is detected during on-site measurements, whereupon either themodel is revised to accommodate a wider or different range ofinfluential factors or a decision is made to exclude one or more typesof good outliers as valid measurements. The on-site measurement systemis adaptable both in its ability to accommodate a potentially changingset of influential factors based on real-time detection of probableoutliers and to refine the definition of valid measurements according toestablished criteria.

[0172] Generation of a global calibration model involves the developmentof a global training set, which is then validated. The process ofvalidating the calibration model is not only important in establishingthe calibration model initially; the validation process attains evengreater importance in maintaining the predictability of the calibrationmodel over time to implement enhancement updates.

[0173] In generating global training sets for developing a propertymodel, it is preferred to exercise discretion in selecting the data usedin the training set. Data indiscriminately incorporated into thetraining set may introduce bad outliers, unwanted good outliers, andessentially duplicative information which, while increasing the size ofthe training set, does not necessarily improve its quality ofprediction.

[0174] Desirably, relative to the total number of observed values of aproperty of interest available for development of a training set, afirst subset of values is selected and used to generate instrumentresponses such that the property of interest is spanned over itsexpected range and the values typically span the range at approximatelyregular intervals. After this subset of values is used to generate apreliminary or a revised version of the calibration model, theseobserved values and the predicted values from the training set are usedto compute the RMSECV. A separate subset of observed and predictedvalues from a validation set is then used to validate this version ofthe model and compute the RMSEP. The subset used for validation isgenerally less than the remainder of the original set of observed valuesless the first subset. If bad outliers are identified in aninstrument-response set, those values are discarded. If a number of goodoutliers are identified, then consistent with the practice ofincorporating minimal numbers of additional calibration data to avoidsubstantially duplicative or incrementally indistinct data, only aportion of the good outliers may be used to develop an extended trainingset.

[0175] The process of developing a calibration model that willcompensate for an effectively comprehensive set of influential factorsincludes an investigation of pretreatment operations. Pretreatmentconsists of filtering a data set from a multichannel instrument to oneor more subregions within a data set, operating on the data set with oneor more mathematical transformations, or both. The combination offiltering and transforming multichannel data is observed to compensatefor variations in some types of influential factors. Filtering may beperformed before or preferably after the other data transformations. Forother types of influential factors, an effective method of pretreatmentmay not be identified. In many such cases, compensation for thoseinfluential factors is possible by extending the training set withobservations that span the expected ranges of the correspondingsecondary variables. In some other cases, a combination of extended orpartially extended training sets and pretreatment is found to beeffective, where a partially extended training set includes observationsthat span some but not all of the expected range of variations of one ormore influential factors. In the remaining cases, pretreatment andenhancement training sets will not provide effective compensation, andit will be necessary to control variations in one or more influentialfactors in order for the property to be measurable to the level ofprecision specified in the method and objectives (blocks 70 and 82 ofFIG. 4) for the model. In those cases where either pretreatment orextension of a training set is possible, consistent with the practice ofincorporating minimal numbers of additional calibration data,pretreatment is preferred.

[0176] The procedure for filtering an instrument response by selectingone or more subregions from the entire region of the instrument responseas part of a pretreatment method to improve the predictive capabilitiesof a calibration model is defined as filter refinement. A flowchart ofthe filter refinement process is shown in FIG. 6. First, a preliminaryproperty model, labeled the i^(th) property model in block 140, isobtained or developed for the property of interest, and RMSECV(i) iscalculated. The i^(th) property model in block 140 can be thepreliminary model of block 76 in FIG. 4, a revised preliminary model ofblock 86 in FIG.4, the i^(th) revised model of block 104 in FIG. 5, orthe i^(th) validated revised model of block 110 in FIG. 5. Filterrefinement may be used to adjust the pretreatment in block 108 of FIG.5.

[0177] Next, in block 142, statistical criteria for acceptable filtersare defined in terms of the RMSECV of a preliminary property model thatwill satisfy the objectives of blocks 70 and 82 in FIG. 4, specificallythe maximum value of RMSEP and the maximum absolute value of the offsetof the validation curve of predicted versus observed results. Forexample, it may be determined that the objectives of blocks 70 and 82for a property of interest will be satisfied if the RMSEP is not greaterthan 1.5 times the RMSECV of the preliminary model and the maximumabsolute offset between predicted and observed values can be allowed tobe as great as 50% of this RMSECV in order to compensate for theinfluential factors, where the offset can be calculated as the absolutevalue of the difference between the average predicted value and theaverage observed value.

[0178] Next, after such statistical criteria are defined, aninstrument-response set is defined in block 144 by recording instrumentresponses acquired as the level of one or more influential factors isvaried over a range. As used herein, a global instrument-response set isone for which the range spans the expected range of variations in eachof the experimentally determined influential factors, and a partialinstrument-response set is one for which the range spans part but notall of the expected range of variations of one or more factors. In thefeasibility stage of model development, it is acceptable to use partialinstrument-response sets. In the development of a model that is suitablefor on-site measurements, it is preferred to use globalinstrument-response sets.

[0179] Next, in block 146, a procedure involving a search algorithm isused to compute RMSEIR for a number of trial filters containing one ormore subregions, where the number of trials j is typically from 50 to300 for a particular property model, although greater or lesser numberscan be used. Thus, from a number of subregions in a plurality of trialfilters comprising discrete combinations of subregions, the multichanneldata from applying each trial filter to an instrument response containsat least one subregion of data within the entire available region ofdata. The instrument-response set for this procedure can be partial orglobal, although it is preferred that the set be global. The specificsubregions evaluated by the search algorithm can be selected bycommercial software such as the Bruker OPUS Quant-2 product. Thesesubregions can also be selected manually. The output from the searchalgorithm can be summarized in a table of trial filters that lists thecorresponding RMSEIR and rank of the PLS model for each trial filter,where a trial filter comprises one or more subregions selected from theavailable multichannel region of a sensor-type. It is convenient toorder the trial filters according to RMSEIR(k), where k ranges from 1 toj, as shown in block 148, such that RMSEIR(k) is less than RMSEIR(k+1)for each k from 1 to j−1, but this step is optional.

[0180] Next, in block 150, a decision is made whether one or more trialfilters from block 148 satisfies the statistical criteria defined inblock 142. If no such trial filter can be identified, then in block 164the customer can be consulted to determine if less precise predictionswill be acceptable. If the customer will change the criteria of block142 such that at least one trial filter exists that satisfies thesecriteria and thereby adjusts the objectives defined in blocks 70 or 82of FIG. 4, or if the outcome of block 150 is yes, then any of thesetrial filters may be selected as an acceptable filter in block 151. Apreferred filter may be selected in block 152 from a group of acceptablefilters by the following decision criteria. Criterion A: The acceptablefilter that produces the smallest value of RMSEP is most preferred.Criterion B: If an acceptable filter is found that is composed of asmaller number of sub-regions than that identified by Criterion A, thenthe acceptable filter composed of a smaller number of sub-regions ispreferred. If two or more such acceptable filters are identified,Criterion A is applied for those filters. Criterion C: If two or moreacceptable filters composed of the same number of subregions are foundfrom applying Criterion B, then acceptable filters corresponding to thesmallest PLS rank are preferred. If two or more such acceptable filtersare identified, Criterion A is applied for those filters.

[0181] In an alternative embodiment of the procedure for selecting thepreferred filter, Criterion C may be used before Criterion B. It is alsopossible to select a preferred filter by using either Criterion B orCriterion C alone.

[0182] Next, in block 154 the (i+1)^(th) property model is built fromthe preliminary (i)^(th) property model by using the preferred filter ofblock 152. RMSEIR(i+1) is computed in block 156. In block 158, theRMSEIR of the (i+1)^(th) property model must also meet the criteria ofblock 142, specifically the RMSEIR of the instrument-response set forthe (i+1)^(th) revised model must meet the same criteria as the RMSEIRof the instrument-response set for the (i)^(th) property model for thepreferred filter. If the RMSEIR of the revised model fails thesecriteria, and therefore is not acceptable in block 158, then a decisionis made in block 162 whether some other acceptable filters of block 151have not yet been considered. If yes, then the next most preferredfilter is selected in step 152, and steps 154, 156, and 158 arerepeated. If the RMSEIR of a revised property model satisfies thesecriteria, then RMSEIR(i+1) is acceptable in block 158, filter refinementis complete in block 160, and the identified preferred filter is calledthe refined filter.

[0183] Returning to block 164, if the customer criteria cannot bealtered, then the filter cannot be refined for the instrument-responseset of block 144, and it becomes necessary to extend the global trainingset in block 167 by including calibration data generated over a range ofvariations in the influential factors that are expected to occur duringfuture measurements using the property model. The (i+1)^(th) propertymodel is built from this training set in block 168, and RMSECV(i+1) iscomputed in block 170. The decision in block 172 of whether RMSECV(i+1)is acceptable is based on the same criteria as used for block 158. IfRMSECV(i+1) is acceptable in block 172, then the calibration modeldeveloped from the extended training set is validated in block 174. IfRMSECV(i+1) is not acceptable in block 172, then it is necessary toevaluate alternatives in block 176. These alternatives include searchingover additional trial filters in block 146 and continuing forward, usingthe (i+1)^(th) property model of block 168 as the i^(th) property modelin block 140 and continuing forward, or selectively omitting calibrationdata in the training set of block 167 to define a partially extendedtraining set for developing the i^(th) property model in block 140 andcontinuing forward. This latter case corresponds to controlling one ormore influential factors by deciding to redefine the objectives of theon-site analysis (blocks 70 and 82 of FIG. 4) to exclude measurementsunder conditions where the property is not measurable to the desiredprecision or to hold those measurement conditions constant. If none ofthe above alternatives is acceptable, it may be decided to use adifferent analytical method in blocks 70 and 82 for on-site analysis.

[0184] The calibration model described herein is generally capable ofpredicting values for the property of interest by compensating forvariations in an effectively comprehensive set of measurement conditionsand secondary material characteristics. Secondary variables whichpotentially influence instrument response can each be evaluated togenerate one of three outcomes: the secondary variable has no effect ora minimal effect on instrument response; the secondary variable has aninfluential effect on instrument response, which can be entirely orsubstantially compensated by pretreatment; or the secondary variable hasan influential effect on instrument response which cannot be compensatedadequately by pretreatment, but can be compensated by extension of thetraining set. Note that pretreatment can include filter refinement.There is also a fourth outcome which does not involve prediction by thecalibration model. This fourth outcome may result from determining if aproperty is measurable in the presence of variations of a secondaryvariable. In this determination, the variation in the property due tovariations in the secondary variable over its expected range is comparedwith the limit of desired precision for predictions of the property ofinterest. If the range of this variation is a substantial percentage, orgreater, of the limit of desired precision of a predicted result, theability of the calibration model to predict values for the property ofinterest is hampered, and may prevent the prediction of usable values.The ability for the model to predict values is a function of the rate ofchange of the property of interest with respect to each secondaryvariable. For a particular secondary variable, if this rate of change isrelatively small, variations in the secondary variable can becompensated by the property model. If this rate of change is relativelylarge compared with the limit of desired precision, then the secondaryvariable must be controlled by restricting the possible range ofvariation during data acquisition and the objectives of blocks 70 and 82redefined accordingly, or an acceptable property model cannot begenerated.

[0185] The generation of a calibration model according to the inventioninvolves the consideration of experimental factors over a wide range inconnection with predicting a property of interest, with testing forpotential influence on the predicted result and, when effective methodsof pretreatment can be identified, no longer requiring that the trainingset be expanded by including observations taken at different levels ofinfluential factors. The result of this procedure is a calibration modelwhich accounts for an effectively comprehensive range of influentialfactors. It is possible that a combination of pre-processing,pretreatment and calibration model revision may be employed inconnection with a single variable, either primary or secondary.

[0186] An advantage of the calibration model developed as describedherein is the ability to compensate for secondary variables previouslyconsidered too significant to overcome in a single calibration modelacross a group of two or more instruments of a particular sensor-type.One such set of secondary variables is the characteristic influentialfactors attributable to each measuring instrument in a group of similarinstruments, where the collection of variations in these variables isdescribed herein as instrument variance. Because of the complexities ofmanufacture, and the tolerances which necessarily exist in connectionwith the manufacture of the component parts, measuring instrumentsconstructed from these component parts will not provide identical outputin response to the same sample. Further, the response of each suchmeasuring instrument will differ one from the other over time, as agewill have a varying effect on the instrument collectively and thecomponent parts individually. For example, component parts withinmanufacturing tolerances but produced in different batches maydemonstrate different output properties over time. Two differentinstruments, even though manufactured at the same time, may be used todifferent degrees, effectively wearing out one instrument faster thanthe other.

[0187] Instead of generating a calibration model on one instrument, thennecessarily transferring that model for use by another and developinginstrument-specific correction algorithms, the calibration modelgenerated herein compensates for variations of the characteristics ofeach instrument within a sensor-type and between such instruments by useof a sensor-type-specific property model. Thus, a single property modelis generated for all instruments of a particular sensor-type, notmultiple property models which must be replicated and corrected oradjusted individually for each instrument within a sensor-type and whichmust take into account the individual characteristics of eachinstrument. Significantly, for a collection of instruments of aparticular sensor-type, the property model of the invention does notrequire any individual identification of the specific sensor in use forthe purpose of building or using the model. Identification of theon-site location, and thus the instrument, may be important for billing,forecasting or archival purposes, among others, but the model operateswithout the need for actually identifying the particular instrument andthus the instrument's characteristics. In this sense, the instrumentvariance is considered in the same way as variation in temperature orsample presentation.

[0188] Instruments of a particular sensor-type must be sufficientlysimilar. To determine if discrete instruments of some sensor-type aresufficiently similar, the calibration set or a subset of the calibrationset that had been used to build a multivariate model for one instrumentis used to validate a second instrument over a range of measurementconditions. The set of predictions from the second instrument using thisvalidation set is obtained from the model developed for the firstinstrument. If the RMSEP of the validation set is within an acceptabletolerance of RMSECV, for example, less than 1.5 times RMSECV of thetraining set used to develop the calibration model for the firstinstrument, the second instrument is determined to be sufficientlysimilar to the first instrument, and both instruments can use the sameproperty model, that model being the model as developed for the firstinstrument without any modifications. Therefore, a group of instrumentsof a particular sensor-type are determined to be sufficiently similar byvalidating each instrument with one or more global property models. Thegroup of such validated instruments produce results from each instrumentthat are statistically equivalent using a single property model for eachproperty of interest. An acceptable tolerance of RMSECV of the trainingset for the property model can be specified as the desired precisionvalue in the objectives of block 70 and/or 82 of FIG. 4.

[0189] The filter refinement procedure of FIG. 6 can be used to developa property model that will compensate for instrument variance. First, instep 140, a global property model is obtained or developed to compensatefor all influential factors except for instrument variance, andRMSECV(i) is calculated. This model is called a single-instrument globalproperty model. After defining statistical criteria in block 142 aspreviously described, an instrument-response set is defined by taking anumber of measurements on one or more instruments different from thatused to develop the single-instrument property model. Then, additionalsteps are performed as indicated in blocks 146, 148, 150, 152, 154, 156,158, as well as those in block 162 and 164 if required, until one of twooutcomes occurs. In the first outcome, the filter is refined (160) andthe model of step 154 is a multi-instrument global property model, whilein the second outcome, it is found that the filter cannot be refined(166). As an alternative to the first outcome, an acceptablemulti-instrument global property model can be built by using anyacceptable filter. If the filter cannot be refined to at least the levelof an acceptable filter, but the extended training set of block 168leads to a successful validation in step 174, the model of block 168 isan acceptable multi-instrument global property model.

[0190] In reference to the alternatives previously described for block176, in the case of instrument variance it is possible to selectivelyomit calibration data in the training set of block 168 acquired from oneor more instruments to define a partially extended training set based ontwo or more instruments, develop the i^(th) property model in block 140based on this partially extended training set, and continue forward. Ifthis procedure leads to the successful development of a multi-sensorglobal property model, it is likely that some aspect of the hardware ofthe omitted analytical instrument is flawed and that instrument shouldbe rejected for use in an on-site sensor. It is also possible to testand accept or reject specific components of an instrument using thisprocedure.

[0191] Predicted values from a multi-instrument global property modelcan be used in a quality control procedure to accept or rejectanalytical instruments or components for use in the sensor devices of anon-site analytical system. In such a quality control procedure,predicted values are generated from one or more untested instruments orcomponents using a previously established multi-instrument globalproperty model. The quality-control data set is the instrument-responseset formed from these new predictions. If the RMSEP of thisinstrument-response set satisfies the criteria of block 158 in FIG. 6,the new instrument or component is acceptable. If not, the training setcan be extended in block 167 and a new multi-instrument global propertymodel can be built in block 168. If the resulting RMSECV computed inblock 170 is acceptable according to the criteria of block 172, then theproperty model developed from this extended training set (block 168) isadopted as the multi-instrument global property model. If the RMSECV isnot acceptable according to the criteria of block 172, then theinstrument is rejected and the new property model built in block 168 isnot adopted. Generally, this quality control procedure is performed forall new or untested instruments that will be installed for on-sitemeasurements. In addition, this quality control procedure can also beperformed for new or untested components used in generatingmeasurements, such as but not limited to probes, interferometers, anddetectors.

[0192] The multi-instrument global property model is able to predictvalues of properties using single algorithms that do not contain anyinstrument-specific parameters. The multichannel data acquired fromdifferent instruments produce statistically equivalent results withoutusing instrument-specific correction factors. Generally, only oneinstrument-specific computational data transformation is undertakenduring the computation of measurement results 14 from measurement data12, specifically during pre-processing in the local processor 34 of anindividual sensor 2, at which time eccentricities in the sample spectrumattributable to the background spectrum unique to the sensor 2 areremoved. No instrument-specific information for use by the modelingalgorithms is transmitted to or stored at the central processor 10.

[0193] Though individual instrument characteristics can be compensatedin practicing the invention, there are limits to the extent ofcompensation which can be performed. For example, it is not presentlypossible to generate a calibration model which can accept instrumentresponses from different types of spectrometers, such as NIR and Raman,and compensate for the different characteristics of each. Theinstruments should preferably be of the same sensor-type, usually fromthe same manufacturer, and be the same model. The instruments within asensor-type should be sufficiently similar to generate statisticallyequivalent results, as described earlier. Most preferably, theinstrument should exhibit narrow manufacturing tolerances as to thosecomponents which affect the instrument's data acquisition performanceand its performance over time. Thus, in the case of a NIR spectrometer,it is important for the instruments to exhibit good interferometeralignment. Over time, it is important for the instruments to exhibitgood light source reproducibility and data acquisition probereproducibility. In the case of instruments utilizing other portions ofthe electromagnetic spectrum or instruments which generate a responsenon-spectroscopically, reproducibility of the component or componentswhich interact with the sample, and the component or components whichregister the response, is desired.

[0194] As noted above, the invention also encompasses a method ofgenerating measurement results for a customer and supplying informationof value to the customer based on these results. The method incorporatesa hardware infrastructure, software and data processing to create amaterial analysis service which encompasses the collection, transmissionand manipulation of data, with delivery of information of value to thecustomer, to the original submitter of data or to an alternate location.The data and information are transmitted along a communication link.

EXAMPLES

[0195] The following detailed examples describe various aspects of theinvention in greater detail. The examples are intended to enable oneskilled in the art to practice the invention, not to limit the scopethereof. Numerous variations are possible without deviating from thespirit and scope of the invention.

Example 1

[0196] A feasibility study was done to determine if a property modelcould be developed to measure the concentration of squalane in squalene.In accordance with block 70 of FIG. 4, the method was defined as FT-NIRusing the MATRIX Model F instrument manufactured by Bruker Optics, withsample presentation for liquid samples provided by closure caps withdimensions of 18-mm diameter×10-mm high manufactured by CincinnatiContainer Corporation. The objectives for the property model includedmeasurements of squalane in squalene having concentrations ranging fromtrace amounts to about 10 weight percent with a limit of desiredprecision of 0.10% or smaller as measured by RMSEP. The objectivesfurther indicated that the measurements will be taken by non-skilledoperators who will dispense about 1 mL of liquid samples into separate,disposable caps, and the sample temperature may vary from about 0° C. to60° C. In accordance with the block 72 of FIG. 4, the expected range ofthe concentration of squalane in squalene was defined as 0 to 10%, wherethe % symbol indicates a percentage calculated from the weight of soluteand the weight of the solution.

[0197] A set of six samples with 0.00%, 2.00%, 4.00%, 6.00%, 8.00% and10.00% of squalane (99%, Aldrich Chemical Company) in squalene (97%,Aldrich Chemical Company) were prepared to serve as calibration samplesto generate a training set, shown as block 74 (FIG. 4), to assess modelfeasibility. The known values were determined by calculation of theconcentrations expressed as weight percentages using the weightsmeasured by an analytical balance. 1.0 mL of each sample was transferredto a separate, disposable cap. For each sample, one FT-NIR spectrum wasacquired in the transflectance mode at ambient temperature (20° C.).

[0198] The type of FT-NIR spectrometer used for all of the examples wasMATRIX Model F instrument manufactured by Bruker Optics equipped with anInGaAs detector and a fiber optic probe approximately 91 cm long. Thefiber optic probe was bundled with 200 optical fibers, 100 forillumination and 100 for collection. Each fiber was 100 μm in diameter,and the total illuminating area was about 3-mm in diameter. The spectralresolution of the spectrometer was 8 cm⁻¹ and the available [Questionabout use of “available” region. See Alan's notes] spectral region ofthe spectrometer was from 4500 cm⁻¹ to 10,000 cm⁻¹. The spectra wereacquired within a relatively brief period of time. Individual averagedspectra for a single sample were generated each within typically lessthan about one minute.

[0199] A button on the fiber optic probe, serving as an input device ofthe user interface, was depressed to initiate data collection of 20spectra at a scanning speed of 20 kHz and about 2 scans per second. Thespectra were averaged in interferogram mode, converted to single-scanmode by fast Fourier Transform, and then converted to an absorbancespectrum for spectroscopic analysis according toabsorbance  spectrum = −log   (single-channel  spectrum/background  spectrum)

[0200] wherein the reference or background spectrum for transflectancemeasurement was the average of 20 scans measured by direct contact ofthe fiber optic probe on a mirror surface. The six observed NIR spectraare shown in FIG. 7.

[0201] As indicated in block 76 of FIG. 4, these 6 spectra were thepre-processed instrument responses in the training set used to build aninitial calibration model, herein called Model 1.0, according to the PLSmethod using Bruker OPUS Quant-2 software. In Examples 1 through 15, thecalibration models were generated using a computer operatingindependently of the central processor, though this was not required.Model 1.0 was built using the entire available [Again question on use of“available”] spectral region from 4500 to 10,000 cm⁻¹ and with no datapretreatment. The feasibility of predictive measurements was assessed bycross-validation of the training set, which produced values of 86.95 forR² and 1.23% for RMSECV with a rank of 3. No outliers were detected inthe training set. The observed and predicted values from the trainingset for Model 1.0 are given in Table 1, and the correspondingcalibration curve is shown in FIG. 8. These results, specifically sincethe coefficient of determination was greater than 60, demonstrated thatthe property was measurable (block 78) and it was feasible to develop amodel (block 100, FIG. 5) to predict the concentration of squalane insqualene. TABLE 1 Residual Sample No. Observed (%) Predicted (%) (Obs −Pred) 1 0.00 2.71 −2.71 2 2.00 1.92 0.08 3 4.00 3.50 0.50 4 6.00 5.330.67 5 8.00 8.54 −0.54 6 10.00 10.89 −0.89

[0202] To increase the sensitivity of the method for the identificationof influential factors (block 84) in the following examples, it waspreferred to use a refined filter (block 160, FIG. 6) as thepretreatment of block 86. Filter refinement, using the OPUS Quant-2software to select trial filters, yielded Model 1.1, which was aproperty model with a refined filter of 4597.5 to 5025.6 cm⁻¹. Model 1.1gave 99.1 for R² and 0.325% for RMSECV with an optimal rank of 3. Theresulting calibration curve and the table of observed and predictedvalues from the training set for Model 1.1 are shown in FIG. 9 and Table2. No outliers were detected in the training set. TABLE 2 ResidualSample No. Observed (%) Predicted (%) (Obs − Pred) 1 0.00 0.49 −0.49 22.00 2.19 −0.19 3 4.00 3.94 0.06 4 6.00 5.80 0.20 5 8.00 8.39 −0.39 610.00 10.40 −0.40

Example 2

[0203] The light intensity of the environment is a potentiallyinfluential factor (block 84). The lights are expected to be either onor off. To determine if a difference in the intensity of backgroundlight in the room will affect the concentrations predicted by Model 1.1(block 88), the spectrum of a single sample with an observed value of2.00% of squalane in squalene was acquired four times with the overheadfluorescent room lights on and four times with the lights off withoutchanging any other measurement conditions. FIG. 10 shows thesuperposition of the resulting eight spectra. Since no measurabledifferences are observed between the spectra, variations in the lightintensity of this particular environment will not affect the predictedconcentrations.

[0204] To further demonstrate that the light intensity is not aninfluential factor, Model 1.1 was used to compute predicted values ofthe concentrations corresponding to each of the eight spectra for the2.00% squalane samples. As shown in Table 3, the residuals between theobserved and predicted values from the validation set each expressed inpercent are each less than the RMSECV of Model 1.1. Specifically, sinceeach residual value in percent is much less than the RMSECV of 0.325%,and since each residual value is less than the desired precision valueof 0.10%, the results were statistically equivalent. Thus, variation inthe light intensity of the environment was not an influential factor(block 88) in the prediction of concentrations of squalane in squaleneby FT-NIR measurements to a precision within the limit of desiredprecision, and no revision to the preliminary model (block 90) wasrequired. TABLE 3 Residual Spectrum No. Light Observed (%) Predicted (%)(Obs − Pred) 1 Off 2.00 2.01 0.01 2 Off 2.00 1.99 −0.01 3 Off 2.00 2.020.02 4 Off 2.00 2.01 0.01 5 On 2.00 2.01 0.01 6 On 2.00 2.01 0.01 7 On2.00 2.01 0.01 8 On 2.00 2.01 0.01

Example 3

[0205] The orientation of the sample cap is a potentially influentialfactor (block 84). The orientations are expected to be random. Todetermine if variation in the orientation of the sample cap (block 86)will affect the predicted concentrations (block 88), a sample of 1.00%of squalane in squalene was prepared and measured with four differentcap orientations. The initial orientation of the cap was selected atrandom, and additional orientations were attained by successivelyrotating the cap by approximately 90 degrees about an axis perpendicularto the bottom of the cap between measurements. As shown in FIG. 11,measurable differences were observed in these spectra, indicating thatorientational variance is probably an influential factor.

[0206] The six calibration samples in the training set of Example 1,each with a sample cap orientation labeled as orientation 1, were thenmeasured with three additional orientations selected at random, labeledas orientations 2, 3, and 4, each rotationally differing by about 90°.It should be understood that a particular numbered orientation, such asorientation 3, indicates only the order in which a particular randomorientation was generated in a sequence of random orientations for asample measurement, so orientation 3, for example, indicates only thatthis was the third random orientation measured. The resulting additional18 spectra were used as a validation set to predict squalaneconcentrations according to Model 1.1 with the refined filter of Model1.1 for pretreatment (block 90). RMSEP was 0.234%, R² was 99.53, and thevalidation curve is shown in FIG. 12. The observed and predicted valuesfrom the validation set are listed in Table 4.

[0207] Since the RMSEP of the preliminary model in the presence ofvariations in sample cap orientation exceeded the limit of desiredprecision, the residuals were statistically significant, andorientational variance was determined to be an influential factor (block88). It was therefore necessary to build a revised model (block 90) tocompensate for orientational variance. TABLE 4 Sample Residual No.Orientation Observed (%) Predicted (%) (Obs − Pred) 1 2 0.00 −0.19 0.191 3 0.00 −0.36 0.36 1 4 0.00 −0.38 0.38 2 2 2.00 1.87 0.13 2 3 2.00 1.800.20 2 4 2.00 1.77 0.23 3 2 4.00 4.02 −0.02 3 3 4.00 3.70 0.30 3 4 4.003.90 0.10 4 2 6.00 5.76 0.24 4 3 6.00 5.67 0.33 4 4 6.00 5.66 0.34 5 28.00 7.65 0.35 5 3 8.00 7.84 0.16 5 4 8.00 7.81 0.19 6 2 10.00 10.08−0.08 6 3 10.00 10.06 −0.06 6 4 10.00 10.02 −0.02

[0208] In order to revise the preliminary model so it would compensatefor orientational variance, the twenty-four spectra obtained frommeasurements of the four different orientations at each concentration,which were the spectra used to generate the predicted values listed inTables 1 and 4, were then used as the training set (block 102, FIG. 5)to build Model 3.0. Assuming there was an approximately quadraticrelationship in the observed versus predicted values in FIGS. 9 and 12,four different levels were used for the orientations in the trainingset. Using the same refined filter identified for Model 1.1, namely4597.5 to 5025.6 cm⁻¹, Model 3.0 was a property model that yieldedvalues of 99.92 for R² and 0.098% for RMSECV with a rank of 5 (block104). No outliers were detected (block 106). It is noted that tocompensate for orientational variance, Model 3.0 required two additionalPLS factors compared with Model 1.1. The calibration curve for Model 3.0is shown in FIG. 13.

[0209] Model 3.0 was validated using the same calibration set but withvalidation measurements taken at new random orientations, labeled asorientation 5 for each sample (block 108). The validation gave RMSEP of0.046% (block 110) which showed a significant improvement in thepredicted results compared with the RMSEP of 0.234% obtained using Model1.1. No outliers were detected (block 112). The validation curve isshown in FIG. 14. The observed and predicted values from the validationset are in Table 5. Since RMSEP was less than the desired precisionvalue of 0.10%, the residuals were statistically insignificant and therevised preliminary model demonstrated that the property was stillmeasurable (block 92) to the limit of desired precision in the presenceof variations in orientation. TABLE 5 Sample Residual No. OrientationObserved (%) Predicted (%) (Obs − Pred) 1 5 0.00 −0.07 0.07 2 5 2.002.05 −0.05 3 5 4.00 3.98 0.02 4 5 6.00 5.94 0.06 5 5 8.00 8.02 −0.02 6 510.00 9.98 0.02

Example 4

[0210] The sample pathlength for the squalane-squalene mixture, which istwice the distance from the air-liquid interface at the top of thesample volume to the reflective surface of the sample cap at the bottomof the sample, is a potentially influential factor (block 84) since theintensity of the NIR absorbance by the sample is proportional to thesample pathlength. The pathlength for a particular measurement isdetermined by the dimensions of the cap and the volume of sampledispensed into the cap.

[0211] Additional aspects of the method and objectives of block 70 arenow defined. Suppose that the disposable caps of Examples 1 to 3 are tobe used as economical sample holders for remote measurements. Sincethese caps are not manufactured identically, variations in thedimensions of the caps, and hence of the sample pathlength for a uniformvolume of material, are inevitable. Further suppose that disposablepipettes with 0.25 mL graduations will be used as economical sampledispensers for remote measurements. Variations in sample volume areexpected to occur under actual measurement conditions due to variationsin operator technique in measuring and dispensing sample volumes.Therefore, pathlength variance is expected to occur during futuremeasurements due to variations in at least two experimental factorsinvolved in sample presentation consisting of dispensing material intodifferent disposable caps using a disposable pipette with 0.25 mLgraduations.

[0212] To determine whether dispensing material into different caps canaffect the predicted concentrations (block 88), 1.0 mL of 1% squalane insqualene was dispensed into two different caps using a pipette with 0.25mL graduations. The FT-NIR spectra obtained on these two subsamples,each in a different random orientation, showed observable differences inintensities at various wave numbers as shown in FIG. 15, so pathlengthis probably an influential factor.

[0213] Mathematical transformations were next considered for thedefinition of pretreatment as indicated in block 102. Normalizationtechniques, such as min-max normalization, vector normalization (VN) andmultiplicative scattering correction (MSC), can be used to compensatefor at least some of the variation in signal intensity. For example,vector normalization transformed the spectra of FIG. 15 into the spectraof FIG. 16.

[0214] To compensate for differences in spectral intensities whichremain after pretreatment, additional calibration data can be includedin the training set 102 which intentionally produce a range of spectralintensities if further improvement in the precision of the model isdesired. Variations in pathlength can be intentionally introduced byusing a selection of different sample volumes to span the range ofpathlengths that are expected to occur during future measurements.

[0215] To compensate for shorter pathlengths, for example, a samplevolume of 0.5 mL was tested. Model 3.0 predicted that a validation setgenerated from the instrument responses from 0.5 mL of a 1.00% squalanevalidation sample in two random orientations contained 2.93% and 2.94%squalane. These predictions were poor because Model 3.0 did not includetraining measurements or data pretreatment that would compensate forvariations in pathlength. Since the residuals of the validation results,1.93% and 1.94%, were greater than the desired limit of precision, theresiduals were statistically significant, pathlength variance wasdetermined to be an influential factor, and it was necessary to build arevised model (block 86).

[0216] Model 4.0 was built (block 104) as a property model bypretreatment of the calibration spectra with vector normalization and arefined filter (block 160, FIG. 6) of 5195.3 cm⁻¹ to 6398.7 cm⁻¹. Model4.0 produced values of 99.99 for R² and 0.0313% for RMSECV with a rankof 6. The calibration curve from the training set of Model 4.0 is shownin FIG. 17. No outliers were detected (block 106).

[0217] The predicted values for the 0.5 mL 1.00% squalane validationsample in two random orientations (block 108) using Model 4.0 became1.35% and 1.44%, compared with the corresponding values of 2.93% and2.94% that had been predicted from Model 3.0 with no adjustments inpretreatment. Since the residuals of the validation measurements usingModel 4.0, 0.35% and 0.44%, were greater than the desired precisionvalue, the defined pretreatment was not adequate to compensate forpathlength variance and it was necessary to extend the training set.

[0218] To make further improvements in the accuracy and precision ofpredicted results for 0.5 mL sample volumes, three additionalcalibration samples were prepared using low, middle, and high levels ofconcentrations to span the range of concentrations in the training set.Assuming that the calibration curve for FIG. 17 was linear, three levelswere sufficient to span the expected range. Specifically, 0.00%, 6.00%and 10.00% samples were prepared with 0.5 mL sample volumes to build anextended training set (block 102) for Model 4.1. The data pretreatmentwas the same as that used for Model 4.0, but the training set wasextended to include 12 additional spectra of the samples with 0.00%,6.00% and 10.00% concentrations, taking spectra for each sample in fourdifferent random orientations. Model 4.1 (block 104) was a propertymodel that produced values of 99.98 for R² and 0.052% for RMSECV with arank of 7. No outliers were detected (block 106). The calibration curveof Model 4.1 is shown in FIG. 18.

[0219] The predicted values of concentration for the 1.00% validationsample then became 1.01% and 1.04% for two different random orientations(block 108) using Model 4.1. Since point estimates of RMSEP, taken asthe residuals 0.01% and 0.04%, were each less than the desired precisionvalue, the residuals were statistically insignificant and the revisedpreliminary model demonstrated that the property was still measurable(block 92) in the presence of pathlength and orientational variations.This process can be continued to extend the training set to span theparticular range of sample volumes and, hence, pathlengths that areanticipated to occur during actual measurements in the future.

Example 5

[0220] Sample temperature is a potentially influential factor (block84). A validation sample was prepared with 1.00% of squalane insqualene. FT-NIR spectra were obtained on two subsamples each at sampletemperatures of 0° C., 20° C. and 60° C. Predicted values of thesqualane concentration in the two validation samples were computed foreach spectrum using Model 4.1 and listed in Table 6. Since the residualsat 60° C. are greater than the limit of desired precision (block 88),temperature was determined to be an influential factor (block 88), andit was necessary to revise the preliminary model (block 90). TABLE 61.00% squalane Subsample 20° C. 0° C. 60° C. Number Predicted (%)Residual Predicted (%) Residual Predicted (%) Residual 1 1.03 −0.03 0.940.06 1.28 −0.28 2 1.04 −0.04 0.96 0.04 1.14 −0.14

[0221] To compensate for variations in sample temperature, the trainingset for Model 4.1 was extended (block 102). The additional calibrationspectra were generated by measuring three 1.0 mL calibration samples,with 0.00%, 6.00% and 10.00% squalane concentrations in squalene, eachat a low temperature (0° C.) and a high temperature (60° C.), generatingspectra using four random cap orientations at each temperature andconcentration. Model 5.0 was a property model built from this extendedtraining set (block 104), which was constructed to predict squalaneconcentration in the range from 0 to 10% with compensation forvariations in temperature, cap orientation and pathlength. The refinedfilter (block 160, FIG. 6) for Model 5.0 was identified as the subregionfrom 5449.9 cm⁻¹ to 7501.8 cm⁻¹ using vector normalization as thepretreatment transformation (block 102). Model 5.0 produced values of99.99 for R² and 0.042% for RMSECV (block 104) with a rank of 9. Thecalibration curve from the training set of Model 5.0 is shown in FIG.19. No outliers were detected in the training set (block 106). Theobserved and predicted values are given in Table 7. TABLE 7 SampleTemper- Observed Predicted Residual No. Volume ature (%) (%) (Obs-Pred)1 1.0 mL Ambient 0.00 0.01 −0.01 2 1.0 mL Ambient 0.00 0.01 −0.01 3 1.0mL Ambient 0.00 −0.03 0.03 4 1.0 ml Ambient 0.00 0.00 0.00 5 1.0 mlAmbient 2.00 1.98 0.02 6 1.0 ml Ambient 2.00 2.01 −0.01 7 1.0 ml Ambient2.00 1.99 0.01 8 1.0 mL Ambient 2.00 1.96 0.04 9 1.0 mL Ambient 4.004.08 −0.08 10 1.0 mL Ambient 4.00 4.02 −0.02 11 1.0 mL Ambient 4.00 4.01−0.01 12 1.0 mL Ambient 4.00 4.04 −0.04 13 1.0 mL Ambient 6.00 5.94 0.0614 1.0 mL Ambient 6.00 5.93 0.07 15 1.0 mL Ambient 6.00 5.98 0.02 16 1.0mL Ambient 6.00 5.94 0.06 17 1.0 mL Ambient 8.00 7.90 0.10 18 1.0 mLAmbient 8.00 8.03 −0.03 19 1.0 mL Ambient 8.00 7.99 0.01 20 1.0 mLAmbient 8.00 8.05 −0.05 21 1.0 mL Ambient 10.00 10.01 −0.01 22 1.0 mLAmbient 10.00 10.02 −0.02 23 1.0 mL Ambient 10.00 10.01 −0.01 24 1.0 mLAmbient 10.00 10.06 −0.06 25 0.5 mL Ambient 0.00 0.00 0.00 26 0.5 mLAmbient 0.00 −0.01 0.01 27 0.5 mL Ambient 0.00 0.01 −0.01 28 0.5 mLAmbient 0.00 0.02 −0.02 29 0.5 mL Ambient 6.00 6.04 −0.04 30 0.5 mLAmbient 6.00 6.05 −0.05 31 0.5 mL Ambient 6.00 5.97 0.03 32 0.5 mLAmbient 6.00 5.91 0.09 33 0.5 mL Ambient 10.00 10.05 −0.05 34 0.5 mLAmbient 10.00 9.98 0.02 35 0.5 mL Ambient 10.00 10.01 −0.01 36 0.5 mLAmbient 10.00 9.98 0.02 37 1.0 mL Low 0.00 −0.02 0.02 38 1.0 mL Low 0.00−0.03 0.03 39 1.0 mL Low 0.00 −0.01 0.01 40 1.0 mL Low 0.00 0.03 −0.0341 1.0 mL Low 6.00 6.10 −0.10 42 1.0 mL Low 6.00 6.01 −0.01 43 1.0 mLLow 6.00 5.99 0.01 44 1.0 mL Low 6.00 6.03 −0.03 45 1.0 mL Low 10.009.99 0.01 46 1.0 mL Low 10.00 9.93 0.07 47 1.0 mL Low 10.00 10.00 0.0048 1.0 mL Low 10.00 9.98 0.02 49 1.0 mL High 0.00 0.12 −0.12 50 1.0 mLHigh 0.00 −0.05 0.05 51 1.0 mL High 0.00 0.00 0.00 52 1.0 mL High 0.000.02 −0.02 53 1.0 mL High 6.00 5.98 0.02 54 1.0 mL High 6.00 6.03 −0.0355 1.0 mL High 6.00 6.00 0.00 56 1.0 mL High 6.00 6.04 −0.04 57 1.0 mLHigh 10.00 9.96 0.04 58 1.0 mL High 10.00 10.04 −0.04 59 1.0 mL High10.00 9.96 0.04 60 1.0 mL High 10.00 9.96 0.04

[0222] As shown in Table 8A, the results predicted from Model 5.0 forthe two original validation samples showed no significant differencesfrom the known value at each measured temperature and a considerableimprovement in predictability compared with Table 6. TABLE 8A 1.00%squalane 20° C. 0° C. 60° C. Subsample Predicted Residual PredictedResidual Predicted Residual Number (%) (Obs-Pred) (%) (Obs-Pred) (%)(Obs-Pred) 1 0.99 0.01 0.99 0.01 1.02 −0.02 2 1.01 −0.01 1.02 −0.02 0.980.02

[0223] A 2.00% squalane in squalene sample was then measured at twoother temperatures within the anticipated 0°-60° C. range to create asmall validation set (block 108). Spectra were acquired for this sampleat 5° C. and 40° C., and concentrations were predicted based on Model4.1 without temperature compensation and Model 5.0 with temperaturecompensation. The predicted values from the validation set are shownbelow in Table 8B. Since the residuals of Model 5.0 were each less thanthe desired limit of precision, the revised model was able to compensatefor variations in sample temperature, pathlength, and orientationmeasured within the expected range, and the property was stillmeasurable (block 92). TABLE 8B Model 5.0 2.00% Squalane Model 4.1Residual Subsample Number Predicted (%) Residual (Obs-Pred) Predicted(%) (Obs − Pred) 1 (5° C.) 1.87 0.13 2.02 −0.02 2 (5° C.) 1.92 0.08 2.04−0.04 1 (40° C.)  2.13 −0.13 2.03 −0.04 2 (40° C.)  2.12 −0.12 2.03−0.03

Example 6

[0224] Humidity of the atmosphere is a potentially influential factor(block 84). Although a NIR instrument may be tightly sealed, moisturemay still get into the interior of the instrument over an extended time.Furthermore, part of the light path between the probe and the sample maybe open to the environment. Humidity in the air either inside or outsidethe instrument may affect the obtained NIR spectrum of a sample.

[0225] In general, there are two approaches to overcoming potentialvariations in environmental humidity. A traditional approach, one thatwould typically be practiced in a laboratory by trained scientists,would be to measure a background spectrum under the actual environmentalconditions immediately before each sample measurement. Then,pre-processing the acquired spectrum of the sample with the backgroundspectrum would eliminate environmental factors such as moistureinterference automatically. However, this approach is not convenient orreliable for on-site measurements by non-skilled operators. Thistraditional approach may also be inadequate in compensating forunexpected short term changes in ambient humidity that could occur, forexample, if an operator were to breathe moist air into the environmentnear the light path to the detector of the instrument.

[0226] The second approach is to include a small number of spectra inthe training set that are generated with a range of humidities thatwould be expected to occur under actual conditions during remotetesting. Since the NIR spectral features of water are much sharper thanmost other types of NIR features from condensed phase samples,variations in humidity can be readily discriminated and compensated bythe PLS calibration model. Therefore, the potential interference fromvariations in humidity can be avoided by extending the training set toinclude some spectra that span a range of humidities.

[0227]FIG. 20 shows two background spectra, the upper spectrum beingtaken under conditions of relatively low humidity and the lower spectrumunder relatively high humidity. For the purpose of practicing thepresent invention, it is not necessary to know or quantitate themagnitudes of these humidities, but only to ensure that the range ofhumidities included in the training set spans the range that is expectedto be encountered in the environment under future measurementconditions. Since measurable differences were observed in these spectra,humidity was probably an influential factor.

[0228] To generate spectra of samples for the training set at variouslevels of humidity, an initial moisture spectrum was required, typicallyat a relatively low humidity value. First, a background measurement wastaken under very dry conditions after desiccant had remained in thetightly sealed instrument for a period of time. Next, the desiccantinside the instrument was removed to allow the internal humidity toincrease to a stable value, and an absorbance spectrum at a higherhumidity was measured. As shown in FIG. 21, the acquired moisturespectrum (bottom spectrum) was then used to generate two higher humidityspectra by multiplying the acquired spectrum by factors of 2 and 3.

[0229] The spectra used to expand the training set (block 102) weregenerated mathematically by adding these three moisture spectra to lowhumidity spectra of samples (taken with the desiccant inside thespectrometer) at 0.00%, 6.00% and 10.00% concentrations in the trainingset. Model 6.0 was a property model built from this expanded trainingset using the pretreatment of Model 5.0 (block 104). No outliers weredetected (block 106).

[0230]FIG. 22 shows the superposition of the four spectra of avalidation sample prepared with 1.00% squalane before and after themathematical addition of the moisture spectra at multiplicative scalingfactors of 1, 2 and 3. When the spectrum with the highest moisturecontent of FIG. 22 (block 108) was used as a validation set to predictsqualane concentration using Model 5.0, the predicted value was 0.67%.Since the absolute value of the residual, 0.33%, was greater than thedesired precision value, this residual was statistically significant,and humidity was determined to be an influential factor. It wastherefore necessary to revise Model 5.0 (block 90) to compensate forhumidity variance. The predicted value from the validation spectrumusing Model 6.0 was 1.02%. Since the residual of 0.02% was less than thelimit of desired precision, the property was still measurable (block 92)in the presence of variations in humidity.

Example 7

[0231] The intensity of the excitation light source of the spectrometeris a potentially influential factor (block 84).

[0232] It was found possible to extend a calibration model to compensatefor possible variations that can arise over time as a spectroscopicsensor unit ages or, equivalently for the purpose of developingcalibration models, for differences in the performance between differentspectroscopic sensor units at an arbitrary time. Specifically, it wasfound that variations in the performance of a small number of componentsin FT-NIR spectrometers account for most of the variations in thespectra that occur over time or that exist between differentspectrometers. Some of these components are the excitation source andthe mechanical alignment of the internal optics. Degradation ofintensity of the light source or replacement of a light source afterfailure as well as a shift in the alignment of optical components maycause changes in the instrument responses and, therefore, of thepredicted values. Traditionally, correction of such instrument variationwould be achieved by re-calibration of each instrument using aremediation update or by adjusting the instrument hardware. The presentinvention uses a new approach to eliminate the need for frequentremediation updates or to reduce significantly the frequency ofre-calibrations and to avoid the need for individual or customadjustments of instrument-specific calibration transfers on particularequipment.

[0233]FIG. 23 shows the background spectra and the absorbance spectra ofa 1.00% squalane validation sample measured by the same FT-NIRinstrument but using three different light sources covering a range ofperformance from a strong, new source to weaker, older sources. Thepredicted values from these three validation spectra of squalane usingModel 6.0 were 1.10%, 1.23% and 1.30%, and RMSEP was 0.226%. Since RMSEPwas greater than the limit of desired precision, it was necessary torevise the model to compensate for variance of the excitation source.

[0234] The uppermost spectrum in FIG. 23 was obtained using the samelight source as that used in Examples 1 through 6, and the two lowerspectra were obtained using weaker light sources. Since there wereoffset, ramp and non-linear relationships between these spectra,pretreatment transformations (block 102) could be used to reducesignificantly the corresponding spectral differences. In the presentexample, a first derivative transformation effectively eliminated theeffects from offset and ramp, and vector normalization or multiplicativescattering correction suppressed the intensity variances due to thenon-linear effects from different light intensities. For example, firstderivative and vector normalization pretreatment of the three squalaneFT-NIR spectra of FIG. 23 effectively reduced the differences betweenspectra as shown in FIG. 24.

[0235] Calibration Model 7.0 was a property model built with a refinedfilter (block 160, FIG. 6) from 5199.2 cm⁻¹ to 8797.7 cm⁻¹ usingpretreatment transformations (block 102) consisting of a firstderivative transformation with 17 smoothing points followed by vectornormalization. Cross-validation of this calibration model gave values of99.99 for R² and 0.042% for RMSECV (block 104) with a rank of 8. Thecalibration curve from the training set of Model 7.0 is shown in FIG.25. No outliers were detected (block 106).

[0236] The predicted values of concentration from the three validationspectra in FIG. 23 using Model 7.0 were 1.02%, 1.02% and 1.03%. Theresiduals of the predicted values from these spectra, 0.02%, 0.02% and0.03%, were each less than the limit of desired precision of 0.10%, sothe property was still measurable (block 92) in the presence ofvariation in the excitation source. Furthermore, this exampledemonstrates that it was possible to compensate for variations in theintensity of the excitation light source by data pretreatment.

[0237] The training set used to develop Model 7.0 is now considered tobe a global training set for a single instrument, and Model 7.0 is asingle-instrument global property model.

Example 8

[0238] Replacement of a fiber optic probe is a potentially influentialfactor (block 84).

[0239] It has been found possible to extend a calibration model tocompensate for effects from changing certain hardware components ascould occur during instrument maintenance. The most likely hardwarecomponents that could be replaced in a FT-NIR system over time includethe desiccant, the excitation light source, the laser source, and thefiber optic probe. The method for compensating for the variances fromaging desiccant and from decay of the excitation light source wasdescribed in Examples 6 and 7. The laser is used to track the wavelengthaccuracy and will be re-calibrated after replacement of a laser source,so the spectra will not be affected significantly if the laser source isreplaced. The current example demonstrates how to compensate for futurereplacement of a fiber optic probe, which may be needed if it becomesaccidentally damaged.

[0240]FIG. 26 shows the spectra of a 2.00% squalane validation samplemeasured by the FT-NIR instrument described in Example 1 using threedifferent fiber optic probes selected to cover a range of fiber opticperformances (block 86). The lower spectrum was obtained using the samefiber optic probe as used in Examples 1 through 7, while the other twospectra were obtained using two different fiber optic probes. Sincemeasurable differences were observed among these spectra, probe varianceis probably an influential factor. The nature of these spectraldifferences were similar to those in FIG. 23, so data pretreatment asdone to correct for light source decay in Example 7, namely firstderivative followed by either vector normalization or multiplicativescattering correction (block 102), would also compensate fortransmission differences from different fiber optic probes. Theeffectiveness of first derivative and vector normalization pretreatmenton the spectra of FIG. 26 is shown in FIG. 27.

[0241] The predicted values of the three validation spectra shown inFIG. 26 using Model 6.0, which was a property model that did not includea first derivative transformation, were 2.02%, 4.22% and 4.39%. Theresiduals, i.e., 2.22% and 2.39%, from the two additional probes wereeach greater than the limit of desired precision, and probe variance wasdetermined to be an influential factor (block 88). The correspondingvalues predicted from Model 7.0, which was a property model thatincluded the first derivative and vector normalization data pretreatment(block 90), were 1.99%, 2.03% and 2.02%. Since the absolute values ofthe residuals, 0.01%, 0.03% and 0.02%, were each less than the desiredprecision value, the property was still measurable (block 92).Furthermore, the identified pretreatment effectively eliminated theimpact on the predicted results from changing fiber optic probes. Thus,Model 7.0 was confirmed to be a single-instrument global property modeland validated for a wider range of instrument variance.

Example 9

[0242] The use of different analytical instruments is a potentiallyinfluential factor (block 84).

[0243] It has been found possible to share a calibration model among twoor more NIR instruments without having to develop individualcalibrations for each instrument or having to use instrumentstandardization or calibration transfer methods. To build models thatwould compensate for variance between instruments, it was necessary touse a set of instruments that are sufficiently similar.

[0244] A real spectrum can be considered as the end result obtained fromthe combination of a hypothetical equipment-independent spectrum withspectral features that arise from equipment-dependent optical parts andalignment, which include the light source, interferometer, mirrors,lens, windows, fiber optics and detectors. Since it is impossible for amanufacturer to produce identical instruments, differences will arise,for example, in the light source intensity, the quality of the opticalparts, the alignment of the optical paths, and the response of thedetectors.

[0245] One method of extending a single-instrument global property modelto multiple instruments is to extend the training set with spectraacquired from several calibration samples that span the expected rangeof the property of interest over a range of measurement conditions asmeasured by other instrument systems. This method directed to twoinstruments is illustrated below.

[0246] For this example, the particular instrument that was used to takemeasurements in Examples 1 through 8 was labeled as Instrument A, and asecond instrument as Instrument B (block 86). When Instrument B was usedto measure the samples from the training set of Example 7 takingmeasurements at four different, random orientations, and when theconcentrations were predicted using the single-instrument globalproperty Model 7.0, the RMSEP of the validation set was 2.19% and thepredicted values exhibited a systematic offset of about 2.2% as shown inFIG. 28. The observed and predicted values from the validation set areshown in Table 9. Since the RMSEP was greater than the limit of desiredprecision, it would be necessary to revise Model 7.0 to compensate forvariance between instruments (blocks 88 and 90).

[0247] For illustrative purposes, the procedure of FIG. 6 was used tobuild a revised property model from an extended training set (block 167)without using a search algorithm (block 146) to select a refined filter(block 160). The property model and RMSECV of block 140 were obtainedfrom Model 7.0. It was determined that the criteria of blocks 70 and 82of FIG. 4, and specifically that the desired precision for RMSEP of0.10%, would be satisfied if the RMSEP of the revised model was notgreater than 2 times the RMSECV of Model 7.0, which was about 0.042%,and if the maximum absolute offset was less than 50% of this RMSECV, orless than about 0.021%. These defined the statistical criteria of block142, which would be used in block 172. TABLE 9 Residual Sample No.Observed (%) Instrument Orientation Predicted (%) (Obs-Pred) 1 0 B 1−2.21 2.21 1 0 B 2 −2.24 2.24 1 0 B 3 −2.25 2.25 1 0 B 4 −2.20 2.20 2 2B 1 −0.20 2.20 2 2 B 2 −0.18 2.18 2 2 B 3 −0.16 2.16 2 2 B 4 −0.19 2.193 4 B 1 1.83 2.17 3 4 B 2 1.82 2.18 3 4 B 3 1.81 2.19 3 4 B 4 1.79 2.214 6 B 1 3.79 2.21 4 6 B 2 3.85 2.15 4 6 B 3 3.86 2.14 4 6 B 4 3.79 2.215 8 B 1 5.82 2.18 5 8 B 2 5.83 2.17 5 8 B 3 5.81 2.19 5 8 B 4 5.79 2.216 10 B 1 7.87 2.13 6 10 B 2 7.89 2.11 6 10 B 3 7.82 2.18 6 10 B 4 7.792.21

[0248] Calibration samples with concentrations of 0.00%, 6.00% and10.00% were then measured by Instrument B and the spectra were appendedto the training set that had been used to build Model 7.0 (block 102).Model 9.0 was a property model built from this extended training set(block 104). R² became 99.99 and RMSECV became 0.039% with a rank of 9.No outliers were detected (block 106). Since the RMSECV was acceptablein block 172 of FIG. 6, the extension was validated in block 174 andinstruments A and B were sufficiently similar using the extendedtraining set. The multi-instrument calibration curve for the extendedtraining set of Model 9.0 is shown in FIG. 29.

[0249] Validation of Model 9.0 using the remaining samples, 1.00%,2.00%, 4.00% and 8.00%, taking measurements with four different, randomorientations using Instrument B to generate the validation set (block108), gave an RMSEP of 0.040% with a prediction offset of essentiallyzero as shown in FIG. 30 (block 110). The observed and predicted valuesfrom the validation set are given in Table 10. No outliers were detected(block 112). Model 9.0 is a multi-instrument global property model thathas been validated for Instruments A and B. TABLE 10 Residual Observed(%) Instrument Orientation Predicted (%) (Obs − Pred) 1.00 B 1 1.01−0.01 1.00 B 2 1.00 0.00 1.00 B 3 1.04 −0.04 1.00 B 4 0.96 0.04 2.00 B 11.98 0.02 2.00 B 2 2.03 −0.03 2.00 B 3 2.05 −0.05 2.00 B 4 2.02 −0.024.00 B 1 4.07 −0.07 4.00 B 2 4.04 −0.04 4.00 B 3 4.04 −0.04 4.00 B 44.02 −0.02 8.00 B 1 8.07 −0.07 8.00 B 2 8.05 −0.05 8.00 B 3 8.04 −0.048.00 B 4 8.04 −0.04

Example 10

[0250] Filter refinement can compensate for variations betweeninstruments. This example demonstrates that predicted values fromdifferent instruments can be rendered statistically equivalent using asingle property model although the training set for that model does notinclude instrument-responses from all instruments.

[0251] The selection of spectral subregions that may not necessarilyminimize RMSECV or RMSEP using the filter refinement procedure of FIG. 6is sometimes useful in compensating for instrumental variance and canavoid having to take calibration measurements on specific instruments.This technique to compensate for instrument-to-instrument variance isbased on two observations. First, a FT-NIR spectrum has a very broadspectral region (4000 cm⁻¹ to 12,000 cm⁻¹ or 2.5 μm to 0.83 μm), andsome narrower subregions within the entire available NIR region arefound to be more sensitive to instrumental variance than others. Theparticular subregions of higher sensitivity are determined, at least inpart, by the instrument design and by the properties of specificcomprising the spectrometer. As a result, these more sensitivesubregions often differ between instrument manufacturers and evenbetween different models from the same manufacturer. Second, there issometimes an option to choose among different spectral subregions thatcan be used to build a property model. It has been found that if one ormore acceptable spectral subregions is chosen to build the propertymodel using filter refinement, then compensating spectra from otherinstruments may do need to be added to the training set of the model.

[0252] Model 10.0 was a revised property model built by refining thefilter for Model 7.0 using OPUS Quant-2 to search for acceptablefilters, and changing the spectral region to the refined filter was thepretreatment adjustment of block 108. This chemometric software provideda routine based on three proprietary search algorithms called NIR,General A, and General B. Table 11A summarizes the best trial regionsfound by OPUS Quant-2 that contained one or two subregions. It is notedthat three trial regions were identified as acceptable filters sincethey produced RMSEP values less than the desired precision of 0.10%,namely trial regions 3, 4, and 6. According to previously describedCriterion A, trial region 6 was most preferred. According to CriterionB, however, trial region 3 was preferred over trial region 6 because ithad a smaller number of subregions. Since application of Criterion Bresulted in a single trial region, Criterion C was not used. Hence, therefined filter (block 160, FIG. 6) for Model 10.0 was selected accordingto Criterion B to be trial region 3, which was the single subregion from4597.5 cm⁻¹ to 9395.6 cm⁻¹.

[0253] For illustrative purposes, an alternative development of Model10.0 was then considered. Since the RMSEP of trial region 6 was muchless than the limit of desired precision, there was an opportunity toimprove significantly the level of predictability of the model. Supposethat during consultation with the customer, it was decided to define animproved level of predictability such that RMSEP would be less than0.05%, thereby redefining the objectives of block 70 in FIG. 4. Then,according to the previously described procedure for filter refinement,and using first derivative (21 smoothing points) and vectornormalization pretreatment (block 102), the refined filter (block 160,FIG. 6) for Model 10.0 was selected to be trial region 6, which was thecomposite subregion from 4597.5 cm⁻¹ to 6398.7 cm⁻¹ and 7594.4 cm⁻¹ to8797.7 cm⁻¹. Cross-validation of Model 10.0 gave 99.99 for R² and 0.046%for RMSECV (block 104) with a rank of 8. The prediction offset wasessentially zero, and no outliers were detected (block 106). Thecross-validation curve for Model 10.0 is reproduced in FIG. 31. TABLE11A Trial OPUS No. Sub- Region Procedure Regions Subregion 1 Subregion 2Rank RMSEP RMSECV 1 NIR 1 5349.6-6101.7 7 0.118 0.058 2 NIR 24597.5-6101.7 7497.9-9993.4 7 0.298 0.052 3 General A 1 4597.5-9395.6 90.0734 0.041 4 General A 2 4597.5-6996.5 8793.9-9993.4 7 0.0527 0.055 5General B 1 4597.5-6398.7 10 0.143 0.051 6 General B 2 4597.5-6398.77594.4-8797.7 8 0.0396 0.046

[0254] Validation of Model 10.0 using calibration samples andmeasurement conditions of the training set of Model 7.0 and usinginstrument responses measured by Instrument B generated a validation set(block 108) that produced RMSEP of 0.0396% (block 110) without asignificant prediction offset as shown in FIG. 32. No outliers weredetected (block 112). The observed and predicted values from thevalidation set are shown in Table 11B. These results demonstrated anacceptable level of precision based on the model generated forInstrument A but used with Instrument B. Therefore, under thesemeasurement conditions it was possible to share a single property modelwith multiple FT-NIR instruments. Model 10.0 was a global property modelfor Instruments A and B, and this model was preferred over Model 9.0since the training set for Model 10.0 did not require trainingmeasurements from Instrument B. Model 10.0 was ready for installation inblock 114 of FIG. 5. TABLE 11B Residual Sample No. Observed (%)Instrument Orientation Predicted (%) (Obs-Pred) 1 0.00 B 1 0.03 −0.03 10.00 B 2 −0.03 0.03 1 0.00 B 3 −0.05 0.05 1 0.00 B 4 −0.03 0.03 2 2.00 B1 1.95 0.05 2 2.00 B 2 2.00 0.00 2 2.00 B 3 2.02 −0.02 2 2.00 B 4 1.990.01 3 4.00 B 1 4.04 −0.04 3 4.00 B 2 4.00 0.00 3 4.00 B 3 3.99 0.01 34.00 B 4 3.97 0.03 4 6.00 B 1 5.94 0.06 4 6.00 B 2 5.96 0.04 4 6.00 B 35.95 0.05 4 6.00 B 4 5.91 0.09 5 8.00 B 1 8.02 −0.02 5 8.00 B 2 7.990.01 5 8.00 B 3 7.96 0.04 5 8.00 B 4 7.96 0.04 6 10.00 B 1 9.99 0.01 610.00 B 2 9.98 0.02 6 10.00 B 3 9.94 0.06 6 10.00 B 4 9.94 0.06

Example 11

[0255] This example illustrates the method of developing severalproperty models for a material after a feasibility study had beencompleted (block 100, FIG. 5) and an effectively comprehensive set ofinfluential factors had been identified along with appropriate methodsof data pretreatment. The method (block 70, FIG. 4) was defined asFT-NIR using the instrument and sample presentation device shown in FIG.33. The objectives (block 70, FIG. 4) included measurements bynon-skilled operators of the total oil, oleic and linolenic contents incanola seeds with measurement precisions characterized by RMSEP valuesless than 0.6% for each property as predicted by multi-instrument globalproperty models.

[0256] A variety of canola was selected which had been bred to containoleic acid with a target specification greater than 70% (present as thetriglyceride and relative to the total oil content) and linolenic acidwith a target specification less than 3.5% (present as the triglycerideand relative to the total oil content). The expected ranges of block 72were 63% to 75% for oleic content, 2.5% to 7.8% for linolenic content,and 44% to 51% for total oil content. The observed values of total oilcontent were determined by extraction using a solvent-based extractionmethod, and those for oleic and linolenic oil content were obtained byanalyzing the extracted oil using gas chromatography. The method fordetermining oil content was AOCS Official Method Am 2-93 (updated 1995).Oleic and linolenic oil content was determined using AOCS OfficialMethod Ce 1-62 (revised 1990). All component concentrations wereadjusted to a dry basis by subtracting the actual moisture content ineach sample from the total sample weight.

[0257] The calibration set of block 74 comprised 45 canola samples thathad been selected to cover the expected ranges of concentrations fortotal oil, oleic oil and linolenic oil, and to span the expected rangeof secondary material characteristics. The canola seeds typically haddiameters ranging from about 1.5 to 2.5-mm. The grain had been partiallycleaned by sieving as is commonly done as part of visual grading used toassess grain quality for grain transactions. A natural selection offoreign matter, called dockage in the grain industry, remained in thesamples in amounts up to about two percent by weight.

[0258] For non-destructive FT-NIR measurements of whole grain, FIG. 33shows the flow-through sample presentation device, which was comprisedof the funnel 202 for presenting grain samples, the flow rate controller206 for the grain sample, the funnel gate 208 for initiating sampleflow, and the grain collector 210, and attached to the FT-NIR instrument200 equipped with fiber optic probe 204. This device was designed toprovide a significantly larger sampling area for data acquisition thanwould be obtained by using a similarly configured fiber optic probe tomeasure a stationary sample of grain.

[0259] FT-NIR measurements were done by first pouring about 250 grams ofcanola into the funnel 202, the funnel having an inner cross-section of120 mm², and then opening the funnel gate 208 half-way to permit thegrain to start flowing into grain collector 210. The flow rate was setat the flow rate controller 206 to pass about 10 grams of canola persecond. The fiber optic probe 204 was engaged by pressing a button onthe probe trigger about one to two seconds after the grain had startedto flow to initiate data collection of 40 spectra at a scanning speed of20 kHz and about 2 scans per second. The spectra were pre-processed byaveraging in interferogram mode, converting to single-scan mode by fastFourier Transform, and then converting to an averaged absorbancespectrum for spectroscopic analysis. The averaged absorbance spectrumwas evaluated by OPUS to predict values of total oil content, oleic oilcontent, linolenic oil content and the Mahalanobis distances from eachproperty model.

[0260] To develop the training sets for the property models (block 102,FIG. 5), each of the 45 samples of the calibration set was firstmeasured 3 times at ambient temperature to generate repeated measures ofthe instrument response. Table 12 lists five samples for the calibrationset that were then used to generate additional calibration spectra foran extended training set which, in combination with data pretreatment,would compensate for an effectively comprehensive range of influentialfactors, including temperature, humidity, light source, fiber probe andinstrument. These five calibration samples spanned the expected rangesof concentrations of total oil, oleic oil and linolenic oil contents asdetermined by solvent extraction and gas chromatography. It wasdemonstrated that the use of different sample presentation devices ofthe type shown in FIG. 33 was not an influential factor for theproperties of interest. TABLE 12 Sample Label % Oleic % Linolenic % OilS1 63.00 7.82 46.30 S2 73.89 3.22 51.20 S3 75.72 2.58 48.10 S4 74.252.67 43.80 S5 69.62 3.57 38.68

[0261] Temperature compensation was included in the calibration modelsto accommodate sample measurements over a wide range of temperatures,specifically from about −60° C. to about 50° C. This was accomplished byfirst cooling samples S1 to S5 in a freezer at −70° C., bringing thesamples to the spectrometer area in contact with dry ice, and thenmeasuring the samples as they warmed slightly during the flow-throughsample presentation. These five samples were then heated in an oven at60° C. and measured as they cooled slightly during flow-through samplepresentation. It is conceptually important to note that it is notnecessary to know the precise values of sample temperatures while theywere warming or cooling during data acquisition in order to build anacceptable multivariate calibration model. Since the NIR spectra of thecalibration training set included measurements at various temperaturesthat spanned the expected range, the PLS procedure generatedmultivariate models that compensated for non-quantified temperaturevariance within the range of temperatures used in the training set.

[0262] Humidity compensation was included in the models using thetechnique of Example 6 by generating spectra with a range of humiditiesfor three canola samples, S1, S3, and S5. The models thus compensatedfor non-quantified humidity variance.

[0263] To optimize the filter and validate the calibration models, thesesamples were measured by 4 additional Bruker MATRIX Model F FT-NIRspectrometers 200. Two spectra of each of the five samples of Table 12measured by two of the spectrometers 200 were included in the extendedtraining set, with the remaining spectra used to create a validation setfor the models. The models thus compensated for non-quantifiedinstrument variance. Some representative spectra of the canola measuredby one spectrometer are shown in FIG. 34.

[0264] Model 11.0 was multi-instrument global a property model (block104, FIG. 5) constructed to predict total oil content using a refinedfilter (block 160, FIG. 6) from 4597.5 cm⁻¹ to 7501.8 cm⁻¹. Datapretreatment also included a first-derivative transformation with 13smoothing points followed by vector normalization. Cross-validation ofModel 11.0 gave 97.89 for R² and 0.527% for RMSECV with a rank of 9 forthe concentration range from 38.68% to 51.20% total oil. The calibrationcurve of Model 11.0 is shown in FIG. 35. The observed and predictedvalues from the extended training set of Model 11.0 for total oil are inTable 13. TABLE 13 Residual Spectrum Sample Observed (%) Predicted (%)(Obs − Pred) 1 1 43.71 42.55 1.16 2 1 43.71 42.46 1.25 3 1 43.71 42.870.84 4 2 43.80 44.66 −0.86 5 2 43.80 44.51 −0.71 6 2 43.80 44.89 −1.09 74 39.78 38.78 1.00 8 4 39.78 38.96 0.82 9 4 39.78 39.67 0.11 10 5 38.6838.50 0.18 11 5 38.68 40.06 −1.38 12 5 38.68 39.60 −0.92 13 7 44.2544.70 −0.45 14 7 44.25 44.86 −0.61 15 7 44.25 44.89 −0.64 16 8 43.3541.89 1.46 17 8 43.35 42.04 1.31 18 8 43.35 42.36 0.99 19 9 43.23 43.70−0.47 20 9 43.23 43.58 −0.35 21 9 43.23 43.55 −0.32 22 10 45.86 45.790.07 23 10 45.86 45.91 −0.05 24 10 45.86 45.75 0.11 25 11 50.30 51.26−0.96 26 11 50.30 51.15 −0.85 27 11 50.30 51.08 −0.78 28 12 48.69 48.91−0.22 29 12 48.69 49.44 −0.75 30 12 48.69 49.78 −1.09 31 13 44.12 44.75−0.63 32 13 44.12 44.42 −0.30 33 13 44.12 44.57 −0.45 34 14 42.27 42.64−0.37 35 14 42.27 42.87 −0.60 36 14 42.27 42.55 −0.28 37 15 44.66 45.27−0.61 38 15 44.66 44.88 −0.22 39 15 44.66 45.24 −0.58 40 16 47.22 47.130.09 41 16 47.22 47.65 −0.43 42 16 47.22 47.42 −0.20 43 17 43.06 42.430.63 44 17 43.06 42.47 0.59 45 17 43.06 42.86 0.20 46 18 47.27 47.270.00 47 18 47.27 47.23 0.04 48 18 47.27 47.13 0.14 49 19 44.45 43.710.74 50 19 44.45 43.76 0.69 51 19 44.45 43.81 0.64 52 20 45.99 45.530.46 53 20 45.99 45.49 0.50 54 20 45.99 45.89 0.10 55 21 50.19 49.660.53 56 21 50.19 49.59 0.60 57 21 50.19 50.08 0.11 58 22 41.80 41.550.25 59 22 41.80 41.87 −0.07 60 22 41.80 41.86 −0.06 61 23 43.18 43.61−0.43 62 23 43.18 43.52 −0.34 63 23 43.18 43.81 −0.63 64 24 46.40 46.130.27 65 24 46.40 46.35 0.05 66 24 46.40 46.03 0.37 67 25 46.30 46.71−0.41 68 25 46.30 47.03 −0.73 69 25 46.30 46.58 −0.28 70 26 51.00 51.70−0.70 71 26 51.00 51.25 −0.25 72 26 51.00 50.70 0.30 73 27 50.10 50.030.07 74 27 50.10 50.54 −0.44 75 27 50.10 49.96 0.14 76 28 51.20 50.970.23 77 28 51.20 51.14 0.06 78 28 51.20 50.70 0.50 79 29 48.10 47.650.45 80 29 48.10 47.88 0.22 81 29 48.10 47.47 0.63 82 30 45.50 45.59−0.09 83 30 45.50 45.26 0.24 84 30 45.50 45.50 0.00 85 31 46.90 46.620.28 86 31 46.90 46.70 0.20 87 31 46.90 46.35 0.55 88 32 42.80 43.83−1.03 89 32 42.80 43.25 −0.45 90 32 42.80 43.34 −0.54 91 33 42.20 42.37−0.17 92 33 42.20 42.08 0.12 93 33 42.20 42.16 0.04 94 34 47.20 47.28−0.08 95 34 47.20 47.75 −0.55 96 34 47.20 47.19 0.01 97 35 47.80 47.800.00 98 35 47.80 47.13 0.67 99 35 47.80 47.61 0.19 100 36 43.80 43.180.62 101 36 43.80 43.55 0.25 102 36 43.80 43.84 −0.04 103 37 43.80 43.520.28 104 37 43.80 43.70 0.10 105 37 43.80 43.92 −0.12 106 38 44.50 45.05−0.55 107 38 44.50 44.46 0.04 108 39 49.40 49.35 0.05 109 39 49.40 48.750.65 110 39 49.40 48.96 0.44 111 41 40.00 40.67 −0.67 112 41 40.00 40.70−0.70 113 41 40.00 40.60 −0.60 114 42 41.10 40.75 0.35 115 42 41.1040.94 0.16 116 42 41.10 41.09 0.01 117 43 50.10 50.45 −0.35 118 43 50.1050.29 −0.19 119 43 50.10 50.43 −0.33 120 44 45.50 45.01 0.49 121 4445.50 45.38 0.12 122 44 45.50 45.75 −0.25 123 28 51.20 50.70 0.50 124 2851.20 50.89 0.31 125 28 51.20 50.86 0.34 126 29 48.10 47.54 0.56 127 2948.10 47.31 0.79 128 29 48.10 48.01 0.09 129 27 50.10 50.01 0.09 130 2750.10 49.99 0.11 131 27 50.10 49.32 0.78 132 2 43.80 44.26 −0.46 133 243.80 44.70 −0.90 134 2 43.80 44.40 −0.60 135 5 38.68 39.43 −0.75 136 538.68 39.31 −0.63 137 5 38.68 39.36 −0.68 138 25 46.30 46.11 0.19 139 2546.30 46.17 0.13 140 25 46.30 46.11 0.19 141 28 51.20 51.64 −0.44 142 2851.20 51.51 −0.31 143 28 51.20 51.60 −0.40 144 29 48.10 47.51 0.59 14529 48.10 47.91 0.19 146 29 48.10 47.94 0.16 147 27 50.10 49.75 0.35 14827 50.10 50.00 0.10 149 27 50.10 49.93 0.17 150 2 43.80 42.85 0.95 151 243.80 43.50 0.30 152 2 43.80 43.44 0.36 153 5 38.68 39.09 −0.41 154 538.68 39.06 −0.38 155 5 38.68 38.73 −0.05 156 25 46.30 46.33 −0.03 15725 46.30 46.76 −0.46 158 25 46.30 46.13 0.17 159 2 43.80 43.57 0.23 1602 43.80 44.06 −0.26 161 5 38.68 39.34 −0.66 162 5 38.68 39.72 −1.04 16325 46.30 46.53 −0.23 164 25 46.30 45.75 0.55 165 28 51.20 51.25 −0.05166 28 51.20 50.86 0.34 167 29 48.10 47.09 1.01 168 2 43.80 43.64 0.16169 2 43.80 43.07 0.73 170 5 38.68 37.93 0.75 171 5 38.68 37.77 0.91 17225 46.30 46.06 0.24 173 25 46.30 45.69 0.61 174 28 51.20 52.25 −1.05 17528 51.20 50.87 0.33 176 29 48.10 47.31 0.79 177 29 48.10 47.73 0.37 1782 43.80 44.04 −0.24 179 2 43.80 43.78 0.02 180 5 38.68 39.36 −0.68 181 538.68 39.18 −0.50 182 25 46.30 45.87 0.43 183 25 46.30 46.47 −0.17 18428 51.20 51.50 −0.30 185 28 51.20 51.30 −0.10 186 29 48.10 47.15 0.95187 29 48.10 47.88 0.22 188 2 43.80 43.72 0.08 189 2 43.80 43.91 −0.11190 5 38.68 39.00 −0.32 191 5 38.68 38.47 0.21 192 25 46.30 46.18 0.12193 25 46.30 46.58 −0.28 194 28 51.20 51.08 0.12 195 28 51.20 51.68−0.48 196 29 48.10 47.38 0.72 197 29 48.10 47.67 0.43

[0265] Model 11.1 was a multi-instrument global property model (block104, FIG. 5) constructed to predict the oleic oil content using arefined filter (block 160, FIG. 6) of two spectral subregions, 4246.5cm⁻¹ to 4601.4 cm⁻¹ and 5449.9 cm⁻¹ to 7501.8 cm⁻¹. Data pretreatmentalso included a first-derivative transformation with 17 smoothing pointsfollowed by vector normalization. Cross-validation of Model 11.1 gave98.01 for R² and 0.525% for RMSECV with a rank of 14 within theconcentration range from 63.00% to 75.72%. The calibration curve forModel 11.1 is shown in FIG. 36. The observed and predicted values fromthe extended training set of Model 11.1 for oleic oil are in Table 14.TABLE 14 Residual Spectrum Sample Observed (%) Predicted (%) (Obs −Pred) 1 1 72.82 72.59 0.23 2 1 72.82 72.33 0.49 3 1 72.82 72.72 0.10 4 274.25 74.14 0.11 5 2 74.25 74.55 −0.30 6 2 74.25 74.41 −0.16 7 5 69.6269.15 0.47 8 5 69.62 70.44 −0.82 9 5 69.62 70.46 −0.84 10 6 74.25 73.860.39 11 6 74.25 74.02 0.23 12 6 74.25 74.17 0.08 13 7 73.91 73.29 0.6214 7 73.91 73.52 0.39 15 7 73.91 73.19 0.72 16 8 73.22 72.49 0.73 17 873.22 72.45 0.77 18 8 73.22 72.71 0.51 19 9 71.18 71.45 −0.27 20 9 71.1871.22 −0.04 21 9 71.18 71.83 −0.65 22 10 74.64 74.35 0.29 23 10 74.6474.67 −0.03 24 10 74.64 74.63 0.01 25 11 74.59 74.47 0.12 26 11 74.5974.65 −0.06 27 11 74.59 74.77 −0.18 28 12 74.39 74.48 −0.09 29 12 74.3974.47 −0.08 30 12 74.39 74.27 0.12 31 13 73.48 73.31 0.17 32 13 73.4873.58 −0.10 33 13 73.48 73.54 −0.06 34 14 73.98 73.87 0.11 35 14 73.9873.35 0.63 36 14 73.98 73.53 0.45 37 15 73.49 73.58 −0.09 38 15 73.4973.20 0.29 39 15 73.49 73.16 0.33 40 16 75.01 74.37 0.64 41 16 75.0174.00 1.01 42 16 75.01 74.29 0.72 43 17 71.71 72.41 −0.70 44 17 71.7172.57 −0.86 45 17 71.71 72.99 −1.28 46 18 73.87 74.35 −0.48 47 18 73.8774.10 −0.23 48 18 73.87 73.47 0.40 49 19 68.05 68.20 −0.15 50 19 68.0568.34 −0.29 51 19 68.05 68.42 −0.37 52 20 71.87 71.59 0.28 53 20 71.8771.92 −0.05 54 20 71.87 71.82 0.05 55 21 74.96 74.67 0.29 56 21 74.9674.91 0.05 57 21 74.96 74.63 0.33 58 22 70.74 70.79 −0.05 59 22 70.7470.66 0.08 60 22 70.74 70.31 0.43 61 23 71.68 71.50 0.18 62 23 71.6871.53 0.15 63 23 71.68 71.97 −0.29 64 24 70.54 70.32 0.22 65 24 70.5470.31 0.23 66 24 70.54 70.04 0.50 67 25 63.00 62.45 0.55 68 25 63.0062.83 0.17 69 25 63.00 62.65 0.35 70 26 75.18 76.22 −1.04 71 26 75.1876.31 −1.13 72 28 73.89 73.36 0.53 73 28 73.89 73.91 −0.02 74 28 73.8974.07 −0.18 75 29 75.72 76.04 −0.32 76 29 75.72 76.62 −0.90 77 29 75.7275.26 0.46 78 30 73.67 74.35 −0.68 79 30 73.67 73.46 0.21 80 30 73.6773.96 −0.29 81 31 73.88 73.22 0.66 82 31 73.88 73.81 0.07 83 31 73.8874.13 −0.25 84 32 71.89 71.06 0.83 85 32 71.89 71.28 0.61 86 32 71.8971.29 0.60 87 33 67.13 67.48 −0.35 88 33 67.13 67.29 −0.16 89 33 67.1367.29 −0.16 90 34 66.89 66.99 −0.10 91 34 66.89 67.17 −0.28 92 34 66.8967.31 −0.42 93 35 72.72 73.84 −1.12 94 35 72.72 73.52 −0.80 95 35 72.7273.50 −0.78 96 36 71.12 71.03 0.09 97 36 71.12 71.04 0.08 98 36 71.1270.85 0.27 99 37 71.39 70.96 0.43 100 37 71.39 70.77 0.62 101 37 71.3971.25 0.14 102 41 71.43 71.26 0.17 103 41 71.43 71.24 0.19 104 42 72.3671.80 0.56 105 42 72.36 71.79 0.57 106 43 73.67 73.91 −0.24 107 43 73.6773.69 −0.02 108 43 73.67 73.89 −0.22 109 28 73.89 73.20 0.69 110 2873.89 73.36 0.53 111 29 75.72 75.53 0.19 112 29 75.72 75.35 0.37 113 2975.72 75.49 0.23 114 2 74.25 74.85 −0.60 115 2 74.25 74.71 −0.46 116 274.25 74.73 −0.48 117 5 69.62 70.04 −0.42 118 5 69.62 69.40 0.22 119 569.62 69.98 −0.36 120 25 63.00 63.24 −0.24 121 25 63.00 63.15 −0.15 12225 63.00 63.61 −0.61 123 28 73.89 74.44 −0.55 124 28 73.89 74.10 −0.21125 28 73.89 73.94 −0.05 126 29 75.72 75.02 0.70 127 29 75.72 75.58 0.14128 29 75.72 75.25 0.47 129 2 74.25 74.50 −0.25 130 2 74.25 74.10 0.15131 2 74.25 74.58 −0.33 132 5 69.62 70.21 −0.59 133 5 69.62 70.60 −0.98134 5 69.62 70.32 −0.70 135 25 63.00 63.22 −0.22 136 25 63.00 63.22−0.22 137 25 63.00 62.80 0.20 138 5 69.78 70.79 −1.01 139 5 69.78 70.81−1.03 140 5 69.78 70.13 −0.35 141 5 69.78 70.18 −0.40 142 5 69.78 70.20−0.42 143 5 69.78 70.25 −0.47 144 5 69.78 70.29 −0.51 145 5 69.78 70.23−0.45 146 2 74.25 73.69 0.56 147 2 74.25 73.65 0.60 148 25 63.00 63.30−0.30 149 25 63.00 63.74 −0.74 150 28 73.89 74.01 −0.12 151 28 73.8974.34 −0.45 152 29 75.72 75.88 −0.16 153 2 74.25 73.82 0.43 154 5 69.6270.12 −0.50 155 5 69.62 69.56 0.06 156 25 63.00 62.94 0.06 157 25 63.0061.93 1.07 158 28 73.89 74.11 −0.22 159 28 73.89 72.76 1.13 160 29 75.7274.64 1.08 161 29 75.72 76.00 −0.28 162 2 74.25 73.84 0.41 163 2 74.2575.07 −0.82 164 5 69.62 69.87 −0.25 165 5 69.62 69.87 −0.25 166 25 63.0062.91 0.09 167 25 63.00 63.00 0.00 168 28 73.89 73.15 0.74 169 28 73.8973.22 0.67 170 29 75.72 75.71 0.01 171 29 75.72 75.85 −0.13 172 5 69.6268.27 1.35 173 5 69.62 68.23 1.39 174 28 73.89 72.56 1.33 175 28 73.8974.08 −0.19 176 2 74.25 75.83 −1.58 177 2 74.25 74.79 −0.54 178 25 63.0062.74 0.26 179 25 63.00 63.94 −0.94 180 29 75.72 76.12 −0.40 181 2975.72 75.33 0.39 182 2 74.25 74.24 0.01 183 2 74.25 74.63 −0.38 184 569.62 69.57 0.05 185 5 69.62 69.56 0.06 186 25 63.00 64.02 −1.02 187 2563.00 62.89 0.11 188 28 73.89 74.12 −0.23 189 28 73.89 74.57 −0.68 19029 75.72 75.45 0.27 191 29 75.72 75.65 0.07 192 25 63.00 62.23 0.77 19325 63.00 63.23 −0.23 194 29 75.72 75.66 0.06 195 29 75.72 75.90 −0.18

[0266] Model 11.2 was a multi-instrument global property model (block104, FIG. 5) constructed to predict linolenic oil content using arefined filter (block 160, FIG. 6) of one spectral subregion, 4616.8cm⁻¹ to 6067.0 cm⁻¹. Data pretreatment also included a first-derivativetransformation with 17 smoothing points followed by vectornormalization. Cross-validation of Model 11.2 gave 96.79 for R² and0.262% for RMSECV with a rank of 13 within the concentration range from1.88% to 7.82%. The calibration curve for Model 11.2 is shown in FIG.37. The observed and predicted values from the extended training set ofModel 11.2 for linolenic oil are in Table 15. TABLE 15 Residual SpectrumSample Observed (%) Predicted (%) (Obs − Pred) 1 1 2.75 3.27 −0.52 2 12.75 2.95 −0.20 3 1 2.75 3.00 −0.25 4 2 2.67 2.28 0.39 5 2 2.67 2.500.17 6 2 2.67 2.49 0.18 7 3 3.92 4.28 −0.36 8 3 3.92 3.94 −0.02 9 3 3.923.71 0.21 10 4 4.05 4.02 0.03 11 4 4.05 4.10 −0.05 12 4 4.05 3.82 0.2313 5 3.57 3.47 0.10 14 5 3.57 3.54 0.03 15 5 3.57 3.62 −0.05 16 6 2.932.95 −0.02 17 6 2.93 2.63 0.30 18 6 2.93 2.93 0.00 19 7 2.79 2.72 0.0720 7 2.79 2.65 0.14 21 7 2.79 2.68 0.11 22 8 2.86 2.97 −0.11 23 8 2.862.85 0.01 24 8 2.86 3.00 −0.14 25 9 3.19 3.23 −0.04 26 9 3.19 3.55 −0.3627 9 3.19 3.42 −0.23 28 10 2.40 2.41 −0.01 29 10 2.40 2.34 0.06 30 102.40 2.25 0.15 31 11 2.68 2.65 0.03 32 11 2.68 2.43 0.25 33 11 2.68 2.74−0.06 34 12 2.65 2.98 −0.33 35 12 2.65 2.76 −0.11 36 12 2.65 2.66 −0.0137 13 2.54 2.68 −0.14 38 13 2.54 2.45 0.09 39 13 2.54 2.86 −0.32 40 142.70 2.66 0.04 41 14 2.70 2.59 0.11 42 14 2.70 2.59 0.11 43 15 3.00 2.790.21 44 15 3.00 2.85 0.15 45 15 3.00 2.85 0.15 46 16 2.48 2.64 −0.16 4716 2.48 2.54 −0.06 48 16 2.48 2.35 0.13 49 17 3.16 2.90 0.26 50 17 3.163.10 0.06 51 17 3.16 2.98 0.18 52 18 2.92 2.38 0.54 53 18 2.92 2.41 0.5154 18 2.92 2.80 0.12 55 20 3.19 3.82 −0.63 56 20 3.19 3.72 −0.53 57 203.19 3.74 −0.55 58 21 2.59 2.61 −0.02 59 21 2.59 2.53 0.06 60 21 2.592.80 −0.21 61 22 3.37 3.63 −0.26 62 22 3.37 3.46 −0.09 63 22 3.37 3.87−0.50 64 23 3.33 3.37 −0.04 65 23 3.33 3.33 0.00 66 23 3.33 3.38 −0.0567 24 4.31 4.40 −0.09 68 24 4.31 3.99 0.32 69 24 4.31 4.06 0.25 70 257.82 7.87 −0.05 71 25 7.82 7.71 0.11 72 25 7.82 7.68 0.14 73 26 2.572.15 0.42 74 26 2.57 2.12 0.45 75 26 2.57 2.37 0.20 76 27 1.88 2.09−0.21 77 27 1.88 1.76 0.12 78 27 1.88 1.99 −0.11 79 28 3.22 3.27 −0.0580 28 3.22 3.32 −0.10 81 28 3.22 3.40 −0.18 82 29 2.58 2.68 −0.10 83 292.58 2.37 0.21 84 29 2.58 2.74 −0.16 85 30 2.77 2.60 0.17 86 30 2.772.56 0.21 87 30 2.77 2.54 0.23 88 31 2.79 2.88 −0.09 89 31 2.79 3.12−0.33 90 31 2.79 2.87 −0.08 91 32 2.85 3.28 −0.43 92 32 2.85 3.26 −0.4193 32 2.85 3.06 −0.21 94 33 4.17 4.27 −0.10 95 33 4.17 4.33 −0.16 96 334.17 4.21 −0.04 97 34 6.17 5.89 0.28 98 34 6.17 5.88 0.29 99 34 6.175.78 0.39 100 35 2.95 2.84 0.11 101 35 2.95 2.93 0.02 102 35 2.95 3.01−0.06 103 36 3.38 3.41 −0.03 104 36 3.38 3.38 0.00 105 36 3.38 3.34 0.04106 37 3.40 3.61 −0.21 107 37 3.40 3.44 −0.04 108 37 3.40 3.36 0.04 10938 2.79 2.32 0.47 110 38 2.79 2.22 0.57 111 39 3.04 3.06 −0.02 112 393.04 2.97 0.07 113 39 3.04 3.04 0.00 114 41 3.08 3.29 −0.21 115 41 3.083.33 −0.25 116 42 2.90 2.83 0.07 117 42 2.90 2.86 0.04 118 42 2.90 3.01−0.11 119 43 3.31 3.30 0.01 120 43 3.31 3.29 0.02 121 43 3.31 2.91 0.40122 44 2.64 2.40 0.24 123 44 2.64 2.44 0.20 124 44 2.64 2.50 0.14 125 283.22 3.78 −0.56 126 28 3.22 3.61 −0.39 127 28 3.22 3.31 −0.09 128 292.58 2.59 −0.01 129 29 2.58 2.70 −0.12 130 29 2.58 2.82 −0.24 131 271.88 2.35 −0.47 132 27 1.88 2.27 −0.39 133 27 1.88 2.29 −0.41 134 2 2.672.35 0.32 135 2 2.67 2.34 0.33 136 2 2.67 2.65 0.02 137 5 3.57 3.11 0.46138 5 3.57 3.36 0.21 139 5 3.57 3.21 0.36 140 25 7.82 7.32 0.50 141 257.82 7.31 0.51 142 25 7.82 6.99 0.83 143 28 3.22 3.09 0.13 144 28 3.223.00 0.22 145 28 3.22 3.16 0.06 146 29 2.58 2.49 0.09 147 29 2.58 2.420.16 148 29 2.58 2.33 0.25 149 27 1.88 2.14 −0.26 150 27 1.88 2.13 −0.25151 27 1.88 2.01 −0.13 152 2 2.67 2.71 −0.04 153 2 2.67 2.74 −0.07 154 22.67 2.74 −0.07 155 5 3.57 3.58 −0.01 156 5 3.57 3.53 0.04 157 5 3.573.62 −0.05 158 25 7.82 7.61 0.21 159 25 7.82 8.08 −0.26 160 25 7.82 8.22−0.40 161 2 2.67 2.61 0.06 162 2 2.67 2.52 0.15 163 5 3.57 3.19 0.38 1645 3.57 3.59 −0.02 165 25 7.82 8.09 −0.27 166 25 7.82 7.79 0.03 167 283.22 3.56 −0.34 168 28 3.22 3.55 −0.33 169 29 2.58 2.26 0.32 170 29 2.582.51 0.07 171 2 2.67 2.25 0.42 172 2 2.67 2.47 0.20 173 5 3.57 3.83−0.26 174 5 3.57 4.14 −0.57 175 25 7.82 7.80 0.02 176 25 7.82 7.57 0.25177 28 3.22 3.17 0.05 178 28 3.22 2.62 0.60 179 29 2.58 2.37 0.21 180 292.58 2.68 −0.10 181 2 2.67 2.62 0.05 182 2 2.67 2.99 −0.32 183 5 3.573.50 0.07 184 5 3.57 3.74 −0.17 185 25 7.82 7.41 0.41 186 25 7.82 8.11−0.29 187 28 3.22 3.25 −0.03 188 28 3.22 3.49 −0.27 189 29 2.58 2.480.10 190 29 2.58 2.35 0.23 191 2 2.67 2.91 −0.24 192 2 2.67 2.82 −0.15193 5 3.57 3.85 −0.28 194 5 3.57 3.95 −0.38 195 25 7.82 7.62 0.20 196 257.82 7.91 −0.09 197 28 3.22 3.70 −0.48 198 28 3.22 3.20 0.02 199 29 2.582.73 −0.15 200 29 2.58 2.47 0.11

[0267] Occasional missing data in Tables 13 to 15 resulted from omittingobservations from the training sets that were identified as bad outliers(block 106, FIG. 5), which occurred when the quality of an acquiredspectrum was poor due to unexpected interruptions in the grain flowduring data acquisition or if an inaccuracy in the reference data onspecific properties of some samples was subsequently identified andcould not be corrected.

[0268] This example demonstrates that it is possible to buildmultivariate calibrations over a wide range of expected temperatures(from about −60 to about +50° C.), but including calibration data oversuch a wide temperature range tends to decrease the precision ofpredicted results. If it is desired or necessary to increase theprecision of predicted results beyond that used in this example, and ifit is possible to precondition or ensure that the samples that will bemeasured at remote locations to a narrower range of temperatures andthereby adjust the objectives of block 82 in FIG. 4, then other modelscan be built from training sets that span temperatures over a narrowerrange.

[0269] The above three property models were validated using 50validation samples (block 108, FIG. 5) not used in the training sets bytaking measurements over a range of measurement conditions with twoBruker MATRIX Model F FT-NIR instruments in Sensors A and B that had notbeen used previously. The validation results (block 110, FIG. 5) areshown in Table 16. No outliers were detected in the validation set(block 112, FIG. 5). TABLE 16 RMSEP RMSEP of R² of of R² of SensorSensor Sensor Sensor Property Range A A B B Wt % total 40.0-50.19 0.5296.08 0.55 95.47 oil Wt % oleic 66.85-76.00  0.53 95.04 0.56 94.21 oilWt % 1.88-7.50  0.25 96.85 0.26 96.92 linolenic oil

[0270] These results showed that the RMSEP values from the validationsets for the three properties were each close to the RMSECV values ofthe corresponding property models, each were within the desired upperprecision limit of 0.6%, and differences in the predicted values fromdifferent instruments were not significant. Models 11.0, 11.1 and 11.2were thus considered to be ready for installation (block 114, FIG. 5) inthe central processor 10.

Example 12

[0271] The Mahalanobis distance can be used to identify bad outlierswhich may arise from invalid measurements. For Examples 12 to 15, athreshold value of the Mahalanobis distance for good outliers wascalculated by OPUS Quant-2 to be 0.42. The threshold values for badoutliers and for extremely bad outliers were taken to be 1.0 and 100.0,respectively.

[0272]FIG. 38 shows an abnormal FT-NIR spectrum acquired after aninstrument malfunction which caused the excitation source to fail. Model11.1 predicted that the oleic oil in the sample was 199.0%. Since theMahalanobis distance for this spectrum was 390.00, the predicted valuewas correctly identified as an extremely bad outlier and thecorresponding measurement results were considered invalid.

Example 13

[0273] The Mahalanobis distance can be used to identify bad outlierswhich may arise from invalid sample presentation.

[0274]FIG. 39 shows two NIR spectra acquired on the same sample ofcanola seeds using the flow-through sample presentation system describedin Example 11. The known value of oleic oil for this sample was 73.1%.The upper spectrum was taken on a 250 gram sample (a valid sample sizeaccording to the method of Example 11), while the lower spectrum wastaken on a 100 gram sample (an invalid sample size according to themethod of Example 11). Model 11.1 predicted that the oleic oil from theupper spectrum was 73.0% while that of the lower spectrum was 66.1%.Since the Mahalanobis distances of the upper spectrum was 0.25 whilethat of the lower spectrum was 5.00, the upper spectrum was a presumablyvalid measurement, while the lower spectrum was correctly identified asa bad outlier and the corresponding measurement results were consideredinvalid.

Example 14

[0275] The Mahalanobis distance can be used to identify bad outlierswhich may arise from valid measurements on samples taken frompopulations different from that of the training set.

[0276]FIG. 40 shows an NIR spectrum of a sample of wheat using thesample presentation system of Example 11. Model 11.1 predicted that theoleic oil in the sample was 33.2%. Since the Mahalanobis distance ofthis spectrum was 44.00, the predicted value was correctly identified asa bad outlier and the corresponding measurement results were consideredinvalid.

Example 15

[0277] The Mahalanobis distance can be used to identify good outliersarising from valid measurements on samples with secondary materialcharacteristics that differ in some characteristic way from thesecondary material characteristics of samples included in the trainingset.

[0278]FIG. 41 shows six NIR spectra from three different samples ofVariety B canola seeds, with duplicate spectra taken for each sampleusing the sample presentation system of Example 11. Variety B wascharacteristically different from Variety A, which was the variety ofcanola used to develop Model 11.1. The observed and predicted values ofoleic oil in these three samples according to Model 11.1 is summarizedin Table 17. Since the Mahalanobis distance of each spectrum was greaterthan the threshold value for good outliers, all predicted results onVariety B canola were good outliers and the corresponding measurementresults were considered invalid. TABLE 17 Sample Observed Predicted MAHS1 72.2% 73.3%, 73.0% 0.72, 0.71 S2 70.6% 71.8%, 72.3% 0.71, 0.73 S371.3% 72.6%, 72.2% 0.86, 0.69

[0279] Model 15.0 was a property model constructed by including twospectra from sample S1 in the training set of Model 11.1 whilemaintaining the corresponding refined filter and pretreatmenttransformations. The results predicted from Model 15.0 using avalidation set containing four spectra from samples S2 and S3 are givenin Table 18. TABLE 16 Sample Observed Predicted MAH S2 70.6% 70.9%,71.5% 0.29, 0.33 S3 71.3% 71.5%, 71.6% 0.33, 0.29

[0280] Since the Mahalanobis distances in Table 18 were each less thanthe threshold for good outliers, the predicted results on Variety Bcanola are no longer probable outliers. The measurement results in Table18 are considered valid, and Model 15.0 compensates for a wider range ofinfluential factors.

Example 16

[0281] A collection of Bruker MATRIX Model F FT-NIR instruments, samplepresentation devices of the type shown in FIG. 33, and laptop computersloaded with Microsoft® Windows® 2000 and the Bruker OPUS 3.01 softwarewere transported to several sites in Canada, remote from the centralprocessor residing in Cincinnati, Ohio. Two remote analysis systems wereassembled at two separate sites in Manitoba, Canada by separatepersonnel, with each sensor 2 comprising one each of the NIR instrument30, equipped with a sample presentation device 22 (shown in FIG. 33) anda laptop computer serving as the local processor 34. Using appropriatecommunications software on the local processor 34, the systems wereconnected to their own separate local area networks with Internetconnectivity.

[0282] Once Internet connectivity was established, operators at bothsites initiated secure connections to the central processor 10, using agraphical user interface on the local processor 34, and entereddevice-specific unique user identification codes and passwords forapproval by the security controller 54. Successful connection to thecentral processor 10 typically occurred within one minute.

[0283] Upon connectivity to the central processor 10, the user interface32 prompted for sample identification. This remote user-supplied sampleidentification data, along with the subsequently acquired multichanneldata, was transmitted to the central processor 10 and used by theanalysis engine 56 to select the appropriate parameters from the datarepository 58 for three multi-instrument global property models: Model11.0 for total oil, Model 11.1 for oleic content, and Model 11.2 forlinolenic content.

[0284] After entering the sample identification data, the operatorsplaced about 250 grams of whole canola seed samples into the funnel 202of the sample presentation device 22. The funnel gate 208 was opened andthe instrument 200 was then activated by the user interface 32 and theflow of the canola was initiated through the sample chamber 24 past thefiber optic probe 204. Within seconds after the sample flow hadcompleted, the measurement data 12 was transmitted over the secureInternet connection 8 to the central processor 10. The analysis engine56 computed values for the properties of interest based on Models 11.0,11.1 and 11.2 (block 116, FIG. 5), and after testing for outliers (block118, FIG. 5) the measurement results 14 were sent back to the individualsensors 2 (block 120, FIG. 5) via Internet connection 8, all withoutadditional remote user action. The measurement data 12 and themeasurement results 14 were stored in the data repository 58. Theelapsed time from initial user interface prompt to display of predictedvalues at an output device of the user interface 32 was generally fromabout 1 to about 2 minutes. The time interval between transmission ofthe measurement data 12 and transmission of the measurement results 14was generally less than one minute.

[0285] Table 19 lists the measurement results 14 generated from sensorsA and B during a short time interval when both sensors were performingnear-simultaneous analyses. Table 19 is an example of a report ofhistorical information including two or more measurement results 14acquired from at least one data acquisition device 2 that wastransmitted to at least one user interface as aggregated results. TABLE19 Submission % Total Total Oil Oleic Linoleic Test No. Time LocationSensor Oil MAH % Oleic MAH % Linolenic MAH 1 12:29 Manitoba A 45.3 0.173.1 0.2 3.2 0.3 2 12:31 Manitoba A 45.0 0.1 72.6 0.3 3.2 0.1 3 12:31Manitoba B 50.5 0.2 76.9 0.2 2.4 0.6 4 12:32 Manitoba A 45.2 0.1 73.30.3 4.0 0.2 5 12:33 Manitoba B 50.5 0.3 77.1 0.2 1.9 0.7

[0286] Two of the predicted results in Table 19, test numbers 3 and 5for linolenic content, were identified as good outliers (blocks 118,122,and 124 in FIG. 5). Investigation revealed that the correspondingsamples were a different variety of canola than that used to develop thecalibration models. If during customer consultation it had been learnedthat the new variety was an experimental crop that would no longer beproduced, it might be decided not to extend the training set (block 126,FIG. 5) so any additional samples of the experimental variety would alsobe considered invalid during future on-site measurements (block 116,FIG. 5). If, however, it had been learned that the new variety wasscheduled for continued production, then it could be decided to extendthe training set (block 126, FIG. 5) so future measurements using a newproperty model (developed according to blocks 102, 104, 106, 108, 110,112 as well as blocks 124, 126, 128, and 130 as required) installed onthe central processor 10 (block 114, FIG. 5) would provide validpredictions from on-site measurements (block 116, FIG. 5) for both newand old varieties of canola.

[0287] Additionally, as demonstrated by the data in Table 20, analyseswere generated on a different date after transporting sensor A to athird remote site in Saskatchewan, Canada, with measurement results 14being returned to user interface 32 at the third site which submitteddata to the central processor 10 with brief lapses over the course ofabout 30 minutes. TABLE 20 Submission % Total Total Oil Oleic LinoleicTest No. Time Location Sensor Oil MAH % Oleic MAH % Linolenic MAH 112:32 Saskatchewan A 47.6 0.2 76.5 0.2 2.1 0.2 2 12:35 Saskatchewan A41.6 0.2 72.3 0.2 2.8 0.2 3 12:37 Saskatchewan A 47.9 0.2 75.7 0.1 2.30.3 4 12:40 Saskatchewan A 47.6 0.2 75.3 0.1 2.1 0.2 5 12:41Saskatchewan A 44.1 0.1 76.0 0.1 2.2 0.2 6 12:44 Saskatchewan A 48.0 0.373.7 0.2 2.7 0.2 7 12:46 Saskatchewan A 41.1 0.1 70.0 0.1 3.1 0.2 812:51 Saskatchewan A 40.8 0.2 71.1 0.1 3.1 0.2 9 12:54 Saskatchewan A45.5 0.2 74.5 0.1 2.3 0.3 10 12:56 Saskatchewan A 46.8 0.2 75.2 0.2 2.10.2 11 12:58 Saskatchewan A 47.5 0.1 76.7 0.1 1.7 0.3 12 13:00Saskatchewan A 42.8 0.1 74.2 0.1 2.5 0.2 13 13:02 Saskatchewan A 48.90.2 76.7 0.1 2.0 0.3 14 13:04 Saskatchewan A 47.1 0.2 76.3 0.1 2.1 0.2

[0288] The on-site analysis system is simple to operate at each locationwhere data is acquired. Because only one property model algorithm isneeded for each property of interest as specified in the method andobjectives of block 70 in FIG. 4, and each of these is stored on thecentral processor, the operator need only have the sample 20 ready to bedetected at the sample presentation device 22 and initiate the dataacquisition by following the prompts which appear at the user interface32. A semiskilled, or even non-skilled operator is able to perform thesteps needed to acquire data on the sample.

[0289] Also, the analysis system and method enables the customer tosubmit guidelines to be stored in the data repository 58 via anappropriate security code to the security controller 54 of the centralprocessor 10 to provide annotations to the measurement results 14 toprovide customer specific interpretations and help text. Thus, thecustomer could independently specify whether a particular range ofpredicted values is a “Pass” or “Fail” for the property of interest. Thecustomer can instruct the central processor 10 to transmit thisannotation as appropriate for particular predicted results generated bythe property model algorithm. It is also possible for the customer tocommand the central processor 10 from an off-site user interface 158 togenerate custom spreadsheets of historical results for forecasting orquality assurance purposes.

[0290] The on-site analysis system and method of analysis can be used ina range of applications for obtaining information on a number ofmaterials. A multi-instrument global property model which can be refinedto a high degree of precision, as necessary, and coupled withsubstantial immunity to instrument, sample, environmental, and samplepresentation variance can be used to produce measurement results 14which are accepted within a particular trade. Thus, it is possible thatan on-site analysis system and method of analysis can be devised withproper attention to inclusion of the requisite variables for thecreation of a method whereby results can be certified by a sanctioningbody. Because the processing of data received from individual sensors 2,4, 6 is conducted by a single property model algorithm of the centralprocessor 10 for each property of interest without instrument-specificcalibration transfer, predicted results from a number of sensors of thesame sensor-type can be directly compared and certified.

[0291] It is possible using this analysis system and method of analysisto compensate for individual sensor variation at the respective samplinglocations. Thus, a global property model can be generated which not onlycompensates for changes to components in one sensor over time, such asthe output of an excitation source, but also compensates for inherentdifferences between similarly constructed sensors. For the case ofspectroscopic instruments, it is preferred that the instrumentmanufacturer generate a line of equipment with individual instrumentsbeing sufficiently similar, having minimized differences as to thefollowing characteristics or components: light intensity; optical parts;alignment of the optical parts; detector performance; and wavelengthaccuracy.

[0292] An additional benefit of using a central processor generatingintercomparable results lies in the value generated by archival analysisof historical information which is built up over time. In the case ofanalyzing oilseeds, for example, information can be stored concerningthe specific location of a particular oilseed analysis with thepredicted results of that analysis. At the time of the data acquisition,input fields of a user interface can include other information whichwould be uniquely beneficial for the particular material being analyzed,for example, as part of an electronic identity preservation system.Financial and crop output predictive studies may also utilize thisinformation.

[0293] It is anticipated that the analytical system and method ofanalysis described herein can be utilized in a wide range ofapplications. These include a number of agriculture-relatedapplications, such as the analysis of oilseed crops, the analysis ofgrain, electronic grading, farm chemicals blending, soil conditionanalysis, waste monitoring, plant nutrition analysis, single-seedanalysis, determination of harvest readiness, manufacturing of animalfeed and forage, dietary supplements, and raw milk and dairy productshandling and processing. In the area of healthcare, the system andmethod can be used in blood analysis, biological sample analysis, skindisease diagnostics, non-invasive human and animal testing, and drugtesting. In chemical manufacturing, the system and method can be used inraw material qualification, process control, quality assurance testing,in-process and finished jet, diesel, automotive fuel quality andidentity, and effluent monitoring. In textiles, applications include rawmaterials qualification, fiber properties qualification, blending andapplication monitoring, and effluent monitoring. In surface treatment,applications include metal treatment analysis, metal wear measurement,coating thickness analysis, and adhesive application measurement. Inconsumer testing, applications include determination of the fat contentof meat, ripeness of fruits and vegetables, automotive fluids check,fuel octane monitoring, exhaust monitoring and personal medical checks,such as for diabetes and cholesterol.

[0294] Thus it is apparent that there has been provided, in accordancewith the invention, an analysis system, a method of analysis and amethod of supplying analysis services to customers which fully satisfiesthe objects, aims, and advantages set forth above. While the inventionhas been described in conjunction with specific embodiments thereof, itis evident that many alternatives, modifications, and variations will beapparent to those skilled in the art in light of the foregoingdescription. Accordingly, departures may be made from such detailswithout departing from the spirit or scope of the general inventiveconcept.

What is claimed is:
 1. A method of predicting a value of a property ofinterest of a material comprising: using a calibration model to processdata obtained from a sample of the material, wherein the calibrationmodel is configured to compensate for instrument variance in predictingthe value of the property of interest.
 2. The method of claim 1 whereinthe calibration model is also configured to compensate for samplevariance.
 3. The method of claim 1 wherein the calibration model is alsoconfigured to compensate for environmental variance.
 4. The method ofclaim 1 wherein the calibration model is also configured to compensatefor sample presentation variance.
 5. The method of claim 2 wherein thecalibration model is also configured to compensate for environmentalvariance.
 6. The method of claim 2 wherein the calibration model is alsoconfigured to compensate for sample presentation variance.
 7. The methodof claim 5 wherein the calibration model is also configured tocompensate for sample presentation variance.
 8. The method of claim 1wherein the calibration model is configured to compensate for variationsin orientation of an excitation source relative to the sample.
 9. Themethod of claim 1 wherein the calibration model is configured tocompensate for variations in age of an excitation source.
 10. The methodof claim 1 wherein the calibration model is configured to compensate forvariations in mechanical alignment of optical components in the dataacquisition device.
 11. The method of claim 1 wherein the calibrationmodel is configured to compensate for variations in the output of adetector.
 12. The method of claim 1 wherein the calibration model isconfigured to compensate for variations in the output of an excitationsource.
 13. The method of claim 1 wherein the calibration model isconfigured to compensate for variations in transmission of a dataacquisition probe.
 14. The method of claim 1 wherein the data aretransmitted over a communication link to a central processor.
 15. Themethod of claim 14 wherein the data is acquired using a data acquisitiondevice at a location remote from the central processor.
 16. The methodof claim 14 wherein the calibration model is configured to compensatefor variance in more than one data acquisition device connectable to thecentral processor by the communication link.
 17. The method of claim 1further comprising pretreating the data.
 18. The method of claim 14further comprising pretreating the data after transmission over thecommunication link.
 19. The method of claim 1 further comprisingpre-processing the data.
 20. The method of claim 1 wherein the data arepretreated with a refined filter.
 21. The method of claim 14 wherein thedata are pretreated with a refined filter after transmission over acommunication link.
 22. The method of claim 20 wherein the data arepre-processed.
 23. The method of claim 21 wherein the data arepre-processed.
 24. The method of claim 1 wherein the data are the resultfrom electromagnetic radiation detected from the sample.
 25. The methodof claim 24 wherein the data are pre-processed.
 26. The method of claim24 wherein the data are pre-processed using Fourier transformation. 27.The method of claim 1 wherein the data are the result of non-stimulatedemission radiation detected from the sample.
 28. The method of claim 1wherein the data are the result from the sample being subjected to massspectrometry.
 29. The method of claim 1 wherein the data are the resultfrom the sample being subjected to chromatography.
 30. The method ofclaim 14 further comprising transmitting a predicted value from thecentral processor to at least one user interface.
 31. The method ofclaim 14 further comprising transmitting a predicted value to a userinterface in the vicinity of the data acquisition device.
 32. The methodof claim 1 wherein the data are an accumulation of individual repetitiveruns.
 33. The method of claim 32 wherein the data are pre-processed. 34.The method of claim 1 wherein at least one probable outlier is used inconnection with updating the calibration model.
 35. The method of claim1 wherein at least one probable outlier detected from one dataacquisition device is used in connection with updating the calibrationmodel for all data acquisition devices using the calibration model. 36.The method of claim 34 wherein the outlier is good.
 37. The method ofclaim 35 wherein the outlier is good.
 38. The method of claim 1 whereinthe calibration model is configured to compensate for non-quantifiedinstrument variance.
 39. The method of claim 1 wherein the calibrationmodel is also configured to compensate for non-quantified samplevariance.
 40. The method of claim 1 wherein the calibration model isalso configured to compensate for non-quantified environmental variance.41. The method of claim 1 wherein the calibration model is alsoconfigured to compensate for non-quantified sample presentationvariance.
 42. The method of claim 39 wherein the calibration model isalso configured to compensate for non-quantified environmental variance.43. The method of claim 39 wherein the calibration model is alsoconfigured to compensate for non-quantified sample presentationvariance.
 44. The method of claim 42 wherein the calibration model isalso configured to compensate for non-quantified sample presentationvariance.
 45. The method of claim 1 wherein the calibration model isconfigured to compensate for non-quantified variations in orientation ofan excitation source relative to the sample.
 46. The method of claim 1wherein the calibration model is configured to compensate fornon-quantified variations in age of an excitation source.
 47. The methodof claim 1 wherein the calibration model is configured to compensate fornon-quantified variations in mechanical alignment of optical componentsin the data acquisition device.
 48. The method of claim 1 wherein thecalibration model is configured to compensate for non-quantifiedvariations in the output of a detector.
 49. The method of claim 1wherein the calibration model is configured to compensate fornon-quantified variations in the output of an excitation source.
 50. Themethod of claim 1 wherein the calibration model is configured tocompensate for non-quantified variations in transmission of a dataacquisition probe.
 51. The method of claim 1 wherein at least a portionof the measurement data is stored in a database of the centralprocessor.
 52. The method of claim 1 wherein at least a portion of themeasurement results is stored in a database of the central processor.53. The method of claim 1 wherein two or more measurement resultsacquired from at least one data acquisition device are transmitted to atleast one user interface as aggregated results.
 54. The method of claim19 wherein the preprocessing uses a background spectrum.
 55. The methodof claim 54 wherein the background spectrum is acquired from a singledata acquisition device.
 56. The method of claim 55 wherein thebackground spectrum is stored at a location, and is used to generatemore than one predicted value of a property of interest using the dataacquisition device.
 57. The method of claim 54 wherein the backgroundspectrum is generated from an accumulation of background spectraacquired from one or more data acquisition devices.
 58. The method ofclaim 57 wherein the background spectrum is stored at a location, and isused to generate more than one predicted value of a property of interestusing the data acquisition device.
 59. The method of claim 7 wherein thedata are transmitted over a communication link to a central processor.60. The method of claim 59 wherein the data is acquired using a dataacquisition device at a location remote from the central processor. 61.The method of claim 59 wherein the calibration model is configured tocompensate for variance in more than one data acquisition deviceconnectable to the central processor by the communication link.
 62. Themethod of claim 7 further comprising pretreating the data.
 63. Themethod of claim 59 further comprising pretreating the data aftertransmission over the communication link.
 64. The method of claim 7further comprising pre-processing the data.
 65. The method of claim 7wherein the data are pretreated with a refined filter.
 66. The method ofclaim 59 wherein the data are pretreated with a refined filter aftertransmission over a communication link.
 67. The method of claim 65wherein the data are pre-processed.
 68. The method of claim 66 whereinthe data are pre-processed.
 69. The method of claim 7 wherein the dataare the result from electromagnetic radiation detected from the sample.70. The method of claim 56 wherein the background spectrum is stored ina local processor connected to the data acquisition device.
 71. Themethod of claim 56 wherein the background spectrum is stored in thecentral processor.
 72. The method of claim 56 wherein the backgroundspectrum is stored at a location remote from the data acquisitiondevice, and connected to the data acquisition device by a communicationlink.
 73. The method of claim 1 wherein parameters defining acalibration model are transmitted from a central processor along acommunication link to a local processor.
 74. The method of claim 1wherein parameters defining a property model are transmitted from acentral processor along a communication link to a local processor. 75.The method of claim 73 wherein the parameters defining the calibrationmodel are transmitted to the local processor of at least one dataacquisition device remote from the central processor.
 76. The method ofclaim 74 wherein the parameters defining the calibration model aretransmitted to the local processor of at least one data acquisitiondevice remote from the central processor.
 77. A system for analyzing amaterial by predicting a value of a property of interest of the materialcomprising: at least one data acquisition device for obtaining data on asample of the material; and a central processor connectable to the dataacquisition device over a communication link, the central processorloaded with a calibration model configured to compensate instrumentvariance in predicting the value of the property of interest.
 78. Thesystem of claim 77 further comprising a local processor to receive thepredicted values for the property of interest.
 79. The system of claim78 wherein the local processor outputs the predicted values for theproperty of interest to a user interface.
 80. The system of claim 77wherein a user interface is located in the vicinity of the dataacquisition device.
 81. The system of claim 77 wherein the dataacquisition device is in a location geographically removed from thecentral processor.
 82. The system of claim 77 wherein the dataacquisition device includes a local processor for pre-processing thedata prior to transmitting the data to the central processor.
 83. Thesystem of claim 77 wherein the communication link is an Internetconnection.
 84. The system of claim 77 wherein the communication link isa private link.
 85. The system of claim 78 wherein the local processoris configured to pre-process the data.
 86. The system of claim 77wherein the central processor is configured to pretreat the data. 87.The system of claim 77 wherein the central processor is configured topretreat the data with a refined filter.
 88. The system of claim 78wherein the local processor is configured to pretreat the data with arefined filter.
 89. The system of claim 78 wherein a user interface islocated in the vicinity of the data acquisition device.
 90. The systemof claim 78 wherein the data acquisition device is in a locationgeographically removed from the central processor.
 91. The system ofclaim 78 wherein the data acquisition device includes a local processorfor pre-processing the data prior to transmitting the data to thecentral processor.
 92. The system of claim 78 wherein the communicationlink is an Internet connection.
 93. The system of claim 78 wherein thecommunication link is a private link.
 94. The system of claim 78 whereinthe central processor is configured to pretreat the data.
 95. The systemof claim 78 wherein the central processor is configured to pretreat thedata with a refined filter.
 96. The system of claim 77 wherein thecommunication link is a combination of public and private links.
 97. Thesystem of claim 78 wherein the communication link is a combination ofpublic and private links.
 98. A method of generating a calibration modelfor predicting a value of a property of interest from data acquired onan unknown sample of material comprising: obtaining a preliminary modelfor predicting the property of interest, the model developed from atraining set using at least one instrument; identifying at least onefactor which may influence the predictive ability of the preliminarymodel for the property of interest; determining the at least one factorwhich influences the predictive ability of the preliminary model outsidea limit of defined precision; and revising the preliminary model tocompensate for variation in the at least one factor which influences theproperty of interest to generate the calibration model, the modelspredicting the value of the property of interest within the limits ofdefined precision.
 99. The method of claim 98 further comprisingpretreating at least a portion of data in the training set.
 100. Themethod of claim 99 further wherein pretreating at least a portion of thetraining set precedes creating the preliminary model.
 101. The method ofclaim 99 further wherein pretreating at least a portion of the trainingset follows creating the preliminary model.
 102. The method of claim 99wherein pretreating at least a portion of the training set comprisesmathematically transforming the training set.
 103. The method of claim99 wherein pretreating at least a portion of the training set comprisesfiltering the training set.
 104. A method of updating a calibrationmodel to compensate for a new influential factor for predicting aproperty of interest, comprising: obtaining a calibration model forpredicting the property of interest, the model developed from a trainingset using at least one instrument; identifying a new factor which mayinfluence the predictive ability of the generated calibration model forthe property of interest using a validation set spanning at least aportion of a range of the new factor; calculating the RMSEP of thevalidation set to determine whether the new factor is influential;determining whether the new influential factor causes the predictedability of the generated calibration model to fall outside a limit ofprecision defined in terms of RMSEP; and updating the generatedcalibration model to compensate for variance in the new influentialfactor to generate a revised calibration model having a predictiveability within the limit of precision.
 105. A method of updating acalibration model to compensate for a modified range of an existingfactor for predicting a property of interest, comprising: obtaining acalibration model for predicting the property of interest, the modeldeveloped from a training set using at least one instrument; identifyingthe modified range of the existing factor which may influence thepredictive ability of the generated calibration model for the propertyof interest using a validation set spanning at least a portion of themodified range; calculating the RMSEP of the validation set to determinewhether the new factor is influential; determining whether the modifiedrange causes the predicted ability of the generated calibration model tofall outside a limit of precision defined in terms of RMSEP; andupdating the generated calibration model to compensate for variance inthe new influential factor to generate a revised calibration modelhaving a predictive ability within the limit of precision.
 106. A methodof updating a calibration model to compensate for a new influentialfactor for predicting a property of interest, comprising: obtaining acalibration model for predicting the property of interest, the modeldeveloped from a first training set using at least one instrument;identifying a new factor which may influence the predictive ability ofthe generated calibration model for the property of interest using asecond training set spanning at least a portion of a range of the newfactor; calculating the RMSECV of the validation set to determinewhether the new factor is influential; determining whether the newinfluential factor causes the predicted ability of the generatedcalibration model to fall outside a limit of precision defined in termsof RMSECV; and updating the generated calibration model to compensatefor variance in the new influential factor to generate a revisedcalibration model having a predictive ability within the limit ofprecision.
 107. A method of updating a calibration model to compensatefor a modified range of an existing factor for predicting a property ofinterest, comprising: obtaining a calibration model for predicting theproperty of interest, the model developed from a first training setusing at least one instrument; identifying the modified range of theexisting factor which may influence the predictive ability of thegenerated calibration model for the property of interest using a secondtraining set spanning at least a portion of the modified range;calculating the RMSECV of the validation set to determine whether thenew factor is influential; determining whether the modified range causesthe predicted ability of the generated calibration model to fall outsidea limit of precision defined in terms of RMSECV; and updating thegenerated calibration model to compensate for variance in the newinfluential factor to generate a revised calibration model having apredictive ability within the limit of precision.
 108. A method ofdefining at least one acceptable region of data wherein the data aregenerated from an instrument response prior to evaluation of the data bya calibration model to determine if pretreatment can compensate for aninfluential factor in predicting a property of interest using thecalibration model, comprising: identifying at least one trial regioncontaining at least one subregion of data within an entire region ofdata from a plurality of trial regions comprising discrete combinationsof subregions; evaluating the calibration model for each identifiedtrial region using a validation set spanning at least a portion of arange of the influential factor; calculating a RMSEP for each identifiedtrial region; and selecting at least one acceptable region from the atleast one identified trial region having the RMSEP within a limit ofprecision identified in terms of RMSEP.
 109. A method of defining arefined filter using a validation set, comprising: identifying at leastone acceptable filter for the validation set; selecting a set of atleast one acceptable filter wherein each filter within the set has thelowest number of subregions; selecting a subset of the set wherein eachfilter within the subset has the lowest rank; and defining the refinedfilter as the filter within the subset which further has the lowestRMSEP calculated from the validation set.
 110. A method of defining arefined filter using a validation set, comprising: identifying at leastone acceptable filter for the validation set; selecting a set of atleast one acceptable filter wherein each filter within the set has thelowest rank; selecting a subset of the set wherein each filter withinthe subset has the lowest number of subregions; and defining the refinedfilter as the filter within the subset which further has the lowestRMSEP calculated from the validation set.
 111. A method of defining arefined filter using a validation set, comprising: identifying at leastone acceptable filter for the validation set; selecting a set of atleast one acceptable filter wherein each filter within the set has thelowest rank; defining the refined filter as the filter within the setwhich further has the lowest RMSEP calculated from the validation set.112. A method of defining at least one acceptable region of data whereinthe data are generated from an instrument response prior to evaluationof the data by a calibration model to determine if pretreatment cancompensate for an influential factor in predicting a property ofinterest using the calibration model, comprising: identifying at leastone trial region containing at least one subregion of data within anentire region of data from a plurality of trial regions comprisingdiscrete combinations of subregions; evaluating the calibration modelfor each identified trial region using a training set spanning at leasta portion of a range of the influential factor; calculating a RMSECV foreach identified trial region; and selecting at least one acceptableregion from the at least one identified trial region having the RMSECVwithin a limit of precision identified in terms of RMSECV.
 113. A methodof defining a refined filter using a training set, comprising:identifying at least one acceptable filter for the training set;selecting a set of at least one acceptable filter wherein each filterwithin the set has the lowest number of subregions; selecting a subsetof the set wherein each filter within the subset has the lowest rank;and defining the refined filter as the filter within the subset whichfurther has the lowest RMSECV calculated from the training set.
 114. Amethod of defining a refined filter using a training set, comprising:identifying at least one acceptable filter for the training set;selecting a set of at least one acceptable filter wherein each filterwithin the set has the lowest rank; selecting a subset of the setwherein each filter within the subset has the lowest number ofsubregions; and defining the refined filter as the filter within thesubset which further has the lowest RMSECV calculated from the trainingset.
 115. A method of defining a refined filter using a training set,comprising: identifying at least one acceptable filter for the trainingset; selecting a set of at least one acceptable filter wherein eachfilter within the set has the lowest rank; defining the refined filteras the filter within the set which further has the lowest RMSECVcalculated from the training set.
 116. A method of revising acalibration model developed from a training set for a calibration modeldevelopment for predicting a value of a property of interest comprising:detecting at least one probable outlier in the training set; identifyingat least one good probable outlier from a group of at least one detectedprobable outlier; and extending the training set using the good probableoutlier to develop a revised calibration model.
 117. The method of claim116 further comprising detecting at least one probable outlier usingMahalanobis distance and a first threshold value as a lower limit foridentifying the probable outlier.
 118. The method of claim 116 furthercomprising identifying at least one good probable outlier usingMahalanobis distance, a first threshold value as a lower limit foridentifying the probable outlier, and a second threshold value as anupper limit for identifying the good probable outlier.
 119. A method ofrevising a calibration model development from a training set forcalibration model development for predicting a value of a property ofinterest comprising: detecting at least one probable outlier in avalidation set; identifying at least one good probable outlier from agroup of at least one detected probable outlier; and extending thetraining set using the good probable outlier to develop a revisedcalibration model.
 120. The method of claim 119 further comprisingdetecting at least one probable outlier using Mahalanobis distance and afirst threshold value as a lower limit for identifying the probableoutlier.
 121. The method of claim 119 further comprising identifying atleast one good probable outlier using Mahalanobis distance, a firstthreshold value as a lower limit for identifying the probable outlier,and a second threshold value as an upper limit for identifying the goodprobable outlier.
 122. A method of revising a calibration modeldeveloped from a training set for use in predicting a value of aproperty of interest on at least one unknown sample comprising:detecting at least one probable outlier in at least one predicted valuefrom measurements on the at least one unknown sample; identifying atleast one good probable outlier from a group comprising at least onedetected probable outlier; and extending the training set using the goodprobable outlier to develop a revised calibration model.
 123. The methodof claim 122 further comprising detecting at least one probable outlierusing Mahalanobis distance and a first threshold value as a lower limitfor identifying the probable outlier.
 124. The method of claim 122further comprising identifying at least one good probable outlier usingMahalanobis distance, a first threshold value as a lower limit foridentifying the probable outlier, and a second threshold value as anupper limit for identifying the good probable outlier.
 125. A method ofrevising a calibration model developed from a training set for acalibration model development for predicting a value of a property ofinterest comprising: detecting at least one probable outlier in thetraining set; identifying at least one good probable outlier from agroup of at least one detected probably outlier; and improving thetraining set by replacing one or more observations in the training setusing the good probable outlier to develop a revised calibration model.126. The method of claim 125 further comprising detecting at least oneprobable outlier using Mahalanobis distance and a first threshold valueas a lower limit for identifying the probable outlier.
 127. The methodof claim 125 further comprising identifying at least one good probableoutlier using Mahalanobis distance, a first threshold value as a lowerlimit for identifying the probable outlier, and a second threshold valueas an upper limit for identifying the good probable outlier.
 128. Amethod of revising a calibration model developed from a training set forcalibration model development for predicting a value of a property ofinterest comprising: detecting at least one probable outlier in avalidation set; identifying at least one good probable outlier from agroup of at least one detected probable outlier; and improving thetraining set by replacing one or more observations in the training setusing the good probable outlier to develop a revised calibration model.129. The method of claim 128 further comprising detecting at least oneprobable outlier using Mahalanobis distance and a first threshold valueas a lower limit for identifying the probable outlier.
 130. The methodof claim 128 further comprising identifying at least one good probableoutlier using Mahalanobis distance, a first threshold value as a lowerlimit for identifying the probable outlier, and a second threshold valueas an upper limit for identifying the good probable outlier.
 131. Amethod of revising a calibration model developed from a training set foruse in predicting a value of a property of interest on at least oneunknown sample comprising: detecting at least one probable outlier in atleast one predicted value from measurements on the at least one unknownsample; identifying at least one good probable outlier from a groupcomprising at least one detected probable outlier; and improving thetraining set by replacing one or more observations in the training setusing the good probable outlier to develop a revised calibration model.132. The method of claim 131 further comprising detecting at least oneprobable outlier using Mahalanobis distance and a first threshold valueas a lower limit for identifying the probable outlier.
 133. The methodof claim 131 further comprising identifying at least one good probableoutlier using Mahalanobis distance, a first threshold value as a lowerlimit for identifying the probable outlier, and a second threshold valueas an upper limit for identifying the good probable outlier.
 134. Amethod of evaluating an instrument for acceptability as a dataacquisition device in combination with a calibration model such that alimit of precision is met by the instrument in combination with thecalibration model in generating a result, comprising: generating avalidation set with the instrument in combination with the calibrationmodel; computing a RMSEP value from the validation set; and acceptingthe instrument if the RMSEP value is within the limit of precision. 135.A method of evaluating a component of an instrument for acceptability asa data acquisition device in combination with a calibration model suchthat a limit of precision is met by the instrument in combination withthe calibration model in generating a result, comprising: generating avalidation set with the component of the instrument in combination withthe calibration model; computing a RMSEP value from the validation set;and accepting the component of the instrument if the RMSEP value iswithin the limit of precision.
 136. A method of providing analysisservices, comprising: providing analysis services on behalf of aplurality of customers using a plurality of data acquisition devicesconnected to a central processor into which is loaded at least onecalibration model configured to generate a predicted result of aproperty of interest from data acquired from a plurality of samplesusing the data acquisition devices, wherein providing analysis servicesincludes transmitting the predicted value of the property of interest toa customer for which analysis services is required for a particularsample of a material.
 137. The method of 136 further comprising updatingone or more calibration models for all of the data acquisition devicesfrom time to time in response to detecting good outliers when processingdata acquired on a plurality of samples by at least one data acquisitiondevice.
 138. The method of claim 136, further comprising receiving a feefrom a customer in response to providing analysis services on behalfthereof.
 139. A program product, comprising: a program configured todefine at least one acceptable region of data, wherein the data aregenerated from an instrument response prior to evaluation of the data bya calibration model to determine if pretreatment can compensate for aninfluential factor in predicting a property of interest using thecalibration model, by identifying at least one trial region containingat least one subregion of data within an entire region of data from aplurality of trial regions comprising discrete combinations ofsubregions; evaluating the calibration model for each identified trialregion using a validation set spanning at least a portion of a range ofthe influential factor; calculating a RMSEP for each identified trialregion; and selecting at least one acceptable region from the at leastone identified trial region having the RMSEP within a limit of precisionidentified in terms of RMSEP, wherein the at least one acceptable regionis selected based on a comparative evaluation of RMSEP, number ofsubregions, and rank of the calibration model, wherein lower values foreach of RMSEP, number of regions and rank are preferred; and a signalbearing medium bearing the program.
 140. The method of claim 139 whereinthe at least one acceptable region is selected based on lowest RMSEP.141. The method of claim 139 wherein the at least one acceptable regionis selected based on lowest number of subregions.
 142. The method ofclaim 139 wherein the at least one acceptable region is selected basedon lowest rank of the calibration model.
 143. A program productcomprising: a program configured to predict a value of a property ofinterest of a material by using a calibration model to process dataobtained from a sample of the material, wherein the calibration model isconfigured to compensate for instrument variance in predicting the valueof the property of interest; and a signal bearing medium bearing theprogram.
 144. The method of claim 143 wherein the calibration model isalso configured to compensate for sample variance.
 145. The method ofclaim 143 wherein the calibration model is also configured to compensatefor environmental variance.
 146. The method of claim 143 wherein thecalibration model is also configured to compensate for samplepresentation variance.
 147. The method of claim 144 wherein thecalibration model is also configured to compensate for environmentalvariance.
 148. The method of claim 144 wherein the calibration model isalso configured to compensate for sample presentation variance.
 149. Themethod of claim 147 wherein the calibration model is also configured tocompensate for sample presentation variance.
 150. The program product ofclaim 143, wherein the signal bearing medium includes at least one of atransmission medium and a recordable medium.
 151. A product programcomprising: a program configured to generate a calibration model forpredicting a value of a property of interest from data acquired on anunknown sample of material by obtaining a preliminary model forpredicting the property of interest, the model developed from a trainingset using at least one instrument; identifying at least one factor whichmay influence the predictive ability of the preliminary model for theproperty of interest; determining the at least one factor whichinfluences the predictive ability of the preliminary model outside alimit of defined precision; and revising the preliminary model tocompensate for variation in the at least one factor which influences theproperty of interest to generate the calibration model, the modelspredicting the value of the property of interest within the limits ofdefined precision; and a signal bearing medium bearing the program. 152.The method of claim 151 further comprising pretreating at least aportion of data in the training set.
 153. The method of claim 152further wherein pretreating at least a portion of the training setprecedes creating the preliminary model.
 154. The method of claim 152further wherein pretreating at least a portion of the training setfollows creating the preliminary model.
 155. The method of claim 152wherein pretreating at least a portion of the training set comprisesmathematically transforming the training set.
 156. The method of claim152 wherein pretreating at least a portion of the training set comprisesfiltering the training set.
 157. A program product comprising: a programconfigured to evaluate a component of an instrument for acceptability asa data acquisition device in combination with a calibration model suchthat a limit of precision is met by the instrument in combination withthe calibration model in generating a result, by generating a validationset with the component of the instrument in combination with thecalibration model; computing a RMSEP value from the validation set; andaccepting the component of the instrument if the RMSEP value is withinthe limit of precision; and a signal bearing medium bearing the program.158. An apparatus, comprising: at least one microprocessor; and aprogram configured to execute on the at least one microprocessor todefine at least one acceptable region of data, wherein the data aregenerated from an instrument response prior to evaluation of the data bya calibration model to determine if pretreatment can compensate for aninfluential factor in predicting a property of interest using thecalibration model, by identifying at least one trial region containingat least one subregion of data within an entire region of data from aplurality of trial regions comprising discrete combinations ofsubregions; evaluating the calibration model for each identified trialregion using a validation set spanning at least a portion of a range ofthe influential factor; calculating a RMSEP for each identified trialregion; and selecting at least one acceptable region from the at leastone identified trial region having the RMSEP within a limit of precisionidentified in terms of RMSEP, wherein the at least one acceptable regionis selected based on a comparative evaluation of RMSEP, number ofsubregions, and rank of the calibration model, wherein lower values foreach of RMSEP, number of regions and rank are preferred.
 159. Theapparatus of claim 158 wherein the at least one acceptable region isselected based on lowest RMSEP.
 160. The apparatus of claim 158 whereinthe at least one acceptable region is selected based on lowest number ofsubregions.
 161. The apparatus of claim 158 wherein the at least oneacceptable region is selected based on lowest rank of the calibrationmodel.
 162. An apparatus, comprising: at least one microprocessor; and aprogram configured to execute on the at least one microprocessor topredict a value of a property of interest of a material by using acalibration model to process data obtained from a sample of thematerial, wherein the calibration model is configured to compensate forinstrument variance in predicting the value of the property of interest.163. The apparatus of claim 162 wherein the calibration model is alsoconfigured to compensate for sample variance.
 164. The apparatus ofclaim 162 wherein the calibration model is also configured to compensatefor environmental variance.
 165. The apparatus of claim 162 wherein thecalibration model is also configured to compensate for samplepresentation variance.
 166. The apparatus of claim 163 wherein thecalibration model is also configured to compensate for environmentalvariance.
 167. The apparatus of claim 163 wherein the calibration modelis also configured to compensate for sample presentation variance. 168.The apparatus of claim 166 wherein the calibration model is alsoconfigured to compensate for sample presentation variance.
 169. Anapparatus, comprising: at least one microprocessor; and a programconfigured to execute on the at least one microprocessor to generate acalibration model for predicting a value of a property of interest fromdata acquired on an unknown sample of material by obtaining apreliminary model for predicting the property of interest, the modeldeveloped from a training set using at least one instrument; identifyingat least one factor which may influence the predictive ability of thepreliminary model for the property of interest; determining the at leastone factor which influences the predictive ability of the preliminarymodel outside a limit of defined precision; and revising the preliminarymodel to compensate for variation in the at least one factor whichinfluences the property of interest to generate the calibration model,the models predicting the value of the property of interest within thelimits of defined precision.
 170. The apparatus of claim 169 furthercomprising pretreating at least a portion of data in the training set.171. The apparatus of claim 170 further wherein pretreating at least aportion of the training set precedes creating the preliminary model.172. The apparatus of claim 170 further wherein pretreating at least aportion of the training set follows creating the preliminary model. 173.The apparatus of claim 170 wherein pretreating at least a portion of thetraining set comprises mathematically transforming the training set.174. The apparatus of claim 170 wherein pretreating at least a portionof the training set comprises filtering the training set.
 175. Anapparatus, comprising: at least one microprocessor; and a programconfigured to execute on the at least one microprocessor to evaluate acomponent of an instrument for acceptability as a data acquisitiondevice in combination with a calibration model such that a limit ofprecision is met by the instrument in combination with the calibrationmodel in generating a result, by generating a validation set with thecomponent of the instrument in combination with the calibration model;computing a RMSEP value from the validation set; and accepting thecomponent of the instrument if the RMSEP value is within the limit ofprecision.
 176. An apparatus comprising: a memory; a calibration modelresident in the memory and configured to compensate for instrumentvariance in predicting a value of a property of interest for a material;and a program configured to process data obtained from a sample of thematerial by using the calibration model, wherein the calibration modelis configured to compensate for instrument variance in predicting thevalue of the property of interest.
 177. An apparatus comprising: amemory; a calibration model resident in the memory for predicting avalue of a property of interest from data acquired on an unknown sampleof material; and a program configured to generate the calibration modelby obtaining a preliminary model for predicting the property ofinterest, the model developed from a training set using at least oneinstrument; identifying at least one factor which may influence thepredictive ability of the preliminary model for the property ofinterest; determining the at least one factor which influences thepredictive ability of the preliminary model outside a limit of definedprecision; and revising the preliminary model to compensate for variancein the at least one factor which influences the property of interest togenerate the calibration model, the models predicting the value of theproperty of interest within the limits of defined precision.
 178. Anapparatus, comprising: a memory; a calibration model resident in thememory and configured for use in evaluating data; and a programconfigured to define at least one trial region of data wherein the dataare generated from an instrument response prior to evaluation of thedata by the calibration model to determine if pretreatment cancompensate for an influential factor in predicting a property ofinterest using the calibration model by identifying at least one trialregion containing at least one subregion of data within an entire regionof data from a plurality of trial regions comprising discretecombinations of subregions; evaluating the calibration model for eachidentified trial region using a validation set spanning at least aportion of a range of the influential factor; calculating a RMSEP foreach identified trial regions; and selecting at least one identifiedtrial region used for generating a RMSEP within a limit of precisiondefined in terms of RMSEP, wherein the at least one trial region isselected based on a comparative evaluation of RMSEP, number ofsubregions, and rank of the calibration model, wherein lower values foreach of RMSEP, number of subregions, and rank of the calibration model,wherein lower values for each of RMSEP, number of subregions and rankare preferred.
 179. An apparatus, comprising: a memory; a calibrationmodel resident in the memory; and a program configured to evaluate acomponent of an instrument for acceptability as a data acquisitiondevise in combination with the calibration model such that a limit ofprecision is met by the instrument in combination with the calibrationmodel in generating a result, by generating a validation set with acomponent of the instrument in combination with the calibration model;computing a RMSEP value from the validation set; and accepting thecomponent of the instrument if the RMSEP value is within the limit ofprecision.